Peter Schaffter <[email protected]> wrote: |On Fri, Sep 08, 2017, Ralph Corderoy wrote: |>> You'll notice that the top of the pdf file has a line of text spit out |>> by grep(1) that obviously shouldn't be there. |> |> I couldn't come up with the groff 1.22.3-7 command line required to |> build the PDF correctly, nor get grep's unwanted output. Deri suggested |> pdfmom's grep might be the culprit, but its stderr should end up on |> pdfmom's stderr? | |Problem solved. | |The superfluous line at the top of the file ["Binary file (standard |input) matches"] isn't stderr, it's stdout, so it becomes part of |the pipeline. The grep in pdfmom is returning a binary file hit when |it encounters the diacritic in | | .ds pdf:look(pdf:bm1) L'étranger | |Since the binary file hit doesn't begin with .ds, it prints literally |at the top of the file. | |The solution is to pass the -a flag to grep. | |Deri: do you want me to fix this in pdfmom and push the change, or |would you prefer to do it yourself? | |Question: why does grep treat the presence of the diacritic as cause |for saying "Binary file (standard input) matches"?
Likely because that is true in your locale? It is very likely that this cannot work (i see -k could possibly happen), suppose you are in a LATIN1 locale and process UTF-8, and it is even worse when your own locale is more picky than LATIN1. Strives me this should be split up so that perl itself performs the grep, in charset-agnostic mode. Even very large documents should generate no limit here, otherwise there is no problem to create the two pipelines concurrently ... --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
