Do you want input (pdf) and output (text) files? --Bob ________________________________ From: Robert Rodini <[email protected]> Sent: Friday, June 9, 2023 9:48 AM To: [email protected] <[email protected]> Subject: Re: extract utility request
Tilman, The -sort flag does not produce the desired results. I need the output to process the first column from top to bottom, then the middle column from top to bottom, then the third column... Maybe there's no way to do this. Thanks, Bob Rodini ________________________________ From: Tilman Hausherr <[email protected]> Sent: Thursday, June 8, 2023 10:41 PM To: [email protected] <[email protected]> Subject: Re: extract utility request -sort https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpdfbox.apache.org%2F2.0%2Fcommandline.html%23extracttext&data=05%7C01%7C%7Cf508d93763784d585c7808db68f041b3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638219153329311135%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=f1NUXO%2BfjyQgZV4srRvB6bYdxYUjk3KuH329SKHvPGA%3D&reserved=0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpdfbox.apache.org%2F2.0%2Fcommandline.html%23extracttext&data=05%7C01%7C%7Cf508d93763784d585c7808db68f041b3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638219153329311135%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=f1NUXO%2BfjyQgZV4srRvB6bYdxYUjk3KuH329SKHvPGA%3D&reserved=0><https://pdfbox.apache.org/2.0/commandline.html#extracttext> Tilman On 08.06.2023 22:38, Robert Rodini wrote: > Hi, > I have successfully used PDFBox ExtractText utility to process PDFs produced > by a third-party. The text comes out of a multicolumn PDF in the left to > right order of the columns from top to bottom. > > I now have to process PDFs produced by another third-party which also > produces a multicolumn PDF. This time the text comes out in an unpredictable > order. > > I've read the FAQ > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpdfbox.apache.org%2F2.0%2Ffaq.html&data=05%7C01%7C%7Cf508d93763784d585c7808db68f041b3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638219153329311135%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Udc3Ot32ASYz4JbGA8RU5gVJwCOkJ9VNyP0MIl%2FwBiI%3D&reserved=0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpdfbox.apache.org%2F2.0%2Ffaq.html&data=05%7C01%7C%7Cf508d93763784d585c7808db68f041b3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638219153329311135%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Udc3Ot32ASYz4JbGA8RU5gVJwCOkJ9VNyP0MIl%2FwBiI%3D&reserved=0><https://pdfbox.apache.org/2.0/faq.html> > regarding "Why does the extracted text appear in the wrong sequence?" > > I'd like to know if there is a command line switch (or something) that I can > do to get the text extracted in the correct order? Can I request an CLI > switch to the ExtractText utility? How to do this? > > Thanks, > Bob Rodini > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
