Il 13/03/2012 10.35, Gert Schmeltz Pedersen ha scritto:
Have you looked into fedoragsearch.log? What does it say, when the pdf is fetched and indexed? Besides, you should go to GSearch 2.4.1, because it has better logging for this, and you might use the Tika extraction functions.

Gert
Hi Gert,
thanks for your answer. We hacve seen that the problem is in PDFBOX 1.6.0 which fails in extracting text. For what concerns Tika, does it use PDFBox to extract text from PDF files?
best regards,
Luca


On 12/03/2012, at 14.40, Luca Lelli wrote:

Hi all,
we have installed GSearch 2.3 which uses last PDFBox version (1.6.0) and we tried to index a set of pdf files which contain text. But the Gsearch function GetDatastreamtext returns an empty string. This PDF files really contain text because we may extract it with other tools. A sample of these PDF files is 'http://magteca-fi.inera.it:80/fedora/e_ntc/2012/0308/17/31/mag_2825+MM294339116b1bf925ddb97c303d5c0f3f+MM294339116b1bf925ddb97c303d5c0f3f.0'
Do you know something more about a problem like this one?
thanks
-- 
Luca Lelli

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users



------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d


_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users



-- 
Luca Lelli
--------------------------
INERA srl
http://www.inera.it
Via Mazzini 138
56125 Pisa
Italy
tel: +39 050 9911815
fax: +39 050 9911830
email: [email protected]
--------------------------


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to