We aren’t currently including font information in PDFs. I _think_ it wouldn’t be too hard to add as <span.../> elements.
On Wed, Oct 16, 2019 at 5:37 AM Jay Chuk <jaychuk2...@gmail.com> wrote: > Thanks Chris > I did that already but within the tag like the paragraph tags there is no > information on the font size or the type of font used. > > It only prints out the text > > Regards, > Jay > > On Tue, Oct 15, 2019, 6:56 PM Chris Mattmann <mattm...@apache.org> wrote: > > > When you do a parse, do this: > > > > > > > > from tika import parser > > > > parsed = parser.from_file(‘/path/to/file’, xmlContent=True) > > > > xmlContent = parsed[“content”] > > > > print(xmlContent) > > > > > > > > G’luck! > > > > > > > > Cheers > > Chris > > > > > > > > > > > > > > > > > > > > *From: *Jay Chuk <jaychuk2...@gmail.com> > > *Date: *Tuesday, October 15, 2019 at 3:54 PM > > *To: *Chris Mattmann <mattm...@apache.org> > > *Cc: *"dev@tika.apache.org" <dev@tika.apache.org> > > *Subject: *Re: [EXTERNAL] Extracting font information from xml > > > > > > > > Thanks for the quick reply Chris. > > > > Please is there a possible code snippet in python for it. > > > > > > > > Reagrds, > > > > Jay > > > > > > > > On Tue, Oct 15, 2019 at 6:52 PM Chris Mattmann <mattm...@apache.org> > > wrote: > > > > Hi Jay, yes, I believe so. Tika Python is just a thin client to Tika > > Server and it > > provides this functionality. CC’ing dev@tika > > > > > > > > > > > > > > > > *From: *Jay Chuk <jaychuk2...@gmail.com> > > *Date: *Tuesday, October 15, 2019 at 3:47 PM > > *To: *"Mattmann, Chris A (US 1761)" <chris.a.mattm...@jpl.nasa.gov> > > *Subject: *[EXTERNAL] Extracting font information from xml > > > > > > > > Hi Chris, > > > > > > > > Thanks for provide the python package -Tika, to use for extracting text > > from pdf's. > > > > > > > > I'll like to confirm it is possible when converting pdf to xml to get > the > > font style for the text e.g the font type, if the text is bold/solid . > > > > I need such information in identifying section headers and titles from > the > > documents. > > > > > > > > Please let me know if it is possible or if there is another way tp gp > > about this. > > > > > > > > Thank you > > > > Jay > > > > >