[Jgeneral] Parsing PDF documents in J

Alex Rufon Tue, 27 Jan 2009 23:27:35 -0800

One of the uses of J in our system is parsing EDI, CSV and Excel files
and importing it to our databases (MS-SQL/Oracle)


 

Now, one of the files that I have not been importing are Purchase Orders
coming from clients in PDF format. I have discussed it with the team
that's maintaining the code for that and it seems that to import PO's in
PDF format, they have to go through these steps:

1.       Export the PDF file into HTML

2.       Parse the HTML file 

3.       Insert/Update the databases

 

Well, I found out that the parsing process is a bit mind boggling
because exporting the PDF into HTML puts in additional complications to
the parsing process. 

 

So does anybody have a suggestion? Can we read the PDF file directly
into J and parse there? 

 

I guess at this point it's obvious that I don't know enough to ask the
right questions so any suggestions or ideas is appreciated.

 

Thanks.

 

r/Alex

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jgeneral] Parsing PDF documents in J

Reply via email to