Even copying selected text out of a pdf file can be unpleasant.  Often 
there will be no newlines, so words may run together when they were 
visually separated by a line break.

On Thursday, June 22, 2023 at 8:52:14 AM UTC-4 David Szent-Györgyi wrote:

> On Sunday, June 18, 2023 at 11:06:30 PM UTC-4 tbp1...@gmail.com wrote:
>
> Very thoughtful piece by Jon Udell - Why LLM-assisted table 
> transformation is a big deal 
> <https://blog.jonudell.net/2023/06/18/why-llm-assisted-table-transformation-is-a-big-deal/>
> .
>
>  
> In my day job, I have to pull useful items out of PDFs  - pictures, text, 
> tables. PDFs often make this difficult - because of password-protected 
> access, and because the information that renders as neatly organized text 
> and tables when printed or displayed in a viewer is not neatly organized - 
> the data in the PDF requires rearrangement. Jon Udell's article mentions 
> this without discussing the specifics of the articles he processes. 
>
> It is true that tools like ChatGPT are trained on text and as such most 
> likely to work on text, but they do not reason about non-text. I would 
> argue that a PDF is non-text, and as such, recreating neatly organized text 
> and tables is error-prone; if we really value the facts in a technical 
> publication, we need to start with suitable source, which probably needs 
> carefully done markup created by experts in the subject matter of the 
> publication. 
>
> I would not trust a complex table produced by ChatGPT, since it is not 
> only not a subject matter expert, it cannot reason as a human being can 
> when making sense of such a document. 
>
> I don't know what to say about the extraordinary domain of software that 
> produces those PDFs. How many of those software applications incorporate 
> features meant to allow exploration of the structure of a document? This 
> sounds to me like the sort of job for which Leo is well-equipped! 
>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/acc2979a-b113-4cd1-8b52-283a5aa61e63n%40googlegroups.com.

Reply via email to