Hi Thomas,

Here is a similar version to the original .pdf I was talking about here: 
http://www.gimpel.com/html/manual.pdf

I think you are right about the issue being with file size instead of file 
content.

When I converted the original pdf file to text and back to pdf it drastically 
changed size. The original is about 4.5MB while the converted one was around 
1MB.

I proceeded to push a bigger .pdf file (around 11.1 MB size with no encryption) 
and I got the same trace error, so apparently it isn’t about the file being 
encrypted or not.

I am getting a similar trace when using uwsgi, I played around with the buffer 
size, but to no avail.
Here is another sample log file when I try to push a big .pdf file to a git 
repo on Kallithea 0.3.5 using uwsgi as a web server.

https://pastebin.com/Y1piX0vX

Tomorrow, I will try to test with the default branch instead of 0.3.5.

Thanks,

Mat

From: Thomas De Schampheleire [mailto:[email protected]]
Sent: Wednesday, June 13, 2018 4:52 PM
To: Matey Chopov <[email protected]>
Cc: [email protected]
Subject: Re: Potential issue with pdf files and git repos

On Wed, Jun 13, 2018, 22:29 Thomas De Schampheleire 
<[email protected]<mailto:[email protected]>> wrote:
2018-06-13 20:39 GMT+02:00 Matey Chopov 
<[email protected]<mailto:[email protected]>>:
> Hi,
>
> It looks like it happens with a specific .pdf manual, I tested it with 
> another .pdf file and the exception didn't occur, the file got pushed 
> correctly.
>
> Here's the line in the trace I think is the most interesting:
>
> DatabaseError: (DatabaseError) file is encrypted or is not a database 
> u'SELECT ui.ui_id AS ui_ui_id, ui.ui_section AS ui_ui_section, ui.ui_key AS 
> ui_ui_key, ui.ui_value AS ui_ui_value, ui.ui_active AS ui_ui_active \nFROM ui 
> \nWHERE ui.ui_key = ?' ('push_ssl',)
> 2018-06-13 11:07:04.946 ERROR [waitress] Exception when servicing 
> <waitress.channel.HTTPChannel connected 
> 0.0.0.0:12756<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F0.0.0.0%3A12756&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174375499&sdata=W6criUpKycuZzM7RhwN%2FHAFWlvo3ORA%2BYjGzFo%2BVbDQ%3D&reserved=0>
>  at 0x7f06c5923e90>
>
> I have uploaded the surrounding log on pastebin: 
> https://pastebin.com/n9fY4xae<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fn9fY4xae&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174385503&sdata=mg7SjCBrBxdwo%2FuD9Bj5O%2FVUgXb4J7A%2BiHT5N3R9X38%3D&reserved=0>
>
> So the problematic pdf that I though wasn't encrypted, was actually encrypted 
> with RC4, the weird thing is that in Git Extensions you can still see the 
> file contents in the "diff" section.
>
> Apparently, pdf readers automatically decrypt such files if there is no 
> password (which is the current case).
>
> I used qpdf to decrypt the file (with no password) which gave another valid 
> .pdf file with no encryption (at least that's what I get when I analyze the 
> file with pdfinfo).
>
> Tried pushing that file too, but it still failed.
>
> I played with the pdf headers, changed the Creator and Producer values to the 
> ones of a .pdf file I know could be uploaded. Same error.
>
> I tried converting the file from pdf1.3 to pdf1.4, same issue.
>
> So, what finally worked for me was converting from pdf to ps, then to text, 
> then from the text file, to ps, and then to pdf. The indexing table got 
> screwed, but that doesn't really bother me. Finally, pushed the new pdf file 
> to the git repo with success.
>
> Commands:
>
> pdftops test.pdf 
> test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174385503&sdata=TGgIAOfHl3F23zTRt%2BBFffRTM9%2FOdJxPIERyGF%2Fq82o%3D&reserved=0>
> ps2txt 
> test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174395507&sdata=zeS75LIlalR4ER%2BJuA%2FQJ4QeYBj1zBaLRS4TMjIC%2B0E%3D&reserved=0>
>  test.txt
>
> enscript -B --margins=10:10 -o 
> test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174406836&sdata=s6sM2Khn5hyKssqEOxzjpxGO3Tt7N7aGnDhJmX8iCq4%3D&reserved=0>
>  -f [email protected]/1<mailto:[email protected]/1> test.txt
> ps2pdf 
> test.ps<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.ps&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174406836&sdata=s6sM2Khn5hyKssqEOxzjpxGO3Tt7N7aGnDhJmX8iCq4%3D&reserved=0>
>  test_last.pdf
>

Your log also shows:

Traceback (most recent call last):
  File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/task.py",
line 74, in handler_thread
    task.service()
  File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/channel.py",
line 368, in service
    request._close()
  File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/parser.py",
line 249, in _close
    body_rcv.getbuf()._close()
  File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/buffers.py",
line 303, in _close
    buf._close()
  File "/opt/Kallithea/local/lib/python2.7/site-packages/waitress/buffers.py",
line 110, in _close
    self.file.close()
IOError: [Errno 9] Bad file descriptor


which reminds me of following two open issues:

https://bitbucket.org/conservancy/kallithea/issues/219/waitress-exception-when-serving-file<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbitbucket.org%2Fconservancy%2Fkallithea%2Fissues%2F219%2Fwaitress-exception-when-serving-file&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174415529&sdata=exLgLZUqvSCv2fdNzcX1RwwbcqYT%2BO6KZUFySQhS0NU%3D&reserved=0>
https://bitbucket.org/conservancy/kallithea/issues/229/bad-file-descriptor<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbitbucket.org%2Fconservancy%2Fkallithea%2Fissues%2F229%2Fbad-file-descriptor&data=02%7C01%7Cmatey.chopov%40ca.abb.com%7C2430c095be924131e48d08d5d16f7f99%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636645199174425545&sdata=Yl31jgLaplWLQL6IAczbkc8jO4x%2Fk2ngaC%2Bh4i%2BqURg%3D&reserved=0>



Is the PDF on which you see the issue something you could share?
Or could you create another PDF with dummy data that also exhibits the issue?

If at all it would be possible, could you test with the default branch
of Kallithea, instead of 0.3.5 ?

Note that it doesn't make sense to me that the contents of the file would 
matter. I think it is more likely about the file size. Could you check the file 
sizes of the different files you tested with?

Also, the reporter of issue #219 reported back that his issue was gone when 
switching away from waitress to another web server, in his case uwsgi. Could 
you try that too ?

Thanks,
Thomas
_______________________________________________
kallithea-general mailing list
[email protected]
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general

Reply via email to