Hi Roberto,
Try doing an index discovery with the command:  

bin/dspace index-discovery -b

On Tuesday, 19 November 2024 at 15:24:23 UTC+3 Roberto Greiner wrote:

> Hi,
>
> I've installed DSpace 8 recently. I'm starting some tests, and one thing 
> I'm can't find out how to make work is search inside PDF's.
>
> For now, to make debugging easier, I only have one PDF and one word 
> document (the word document only has the word "test" in it) in my DSpace 
> install. I've run '/dspace/bin/dspace filter-media -force' to make sure 
> that everything is indexed (the same command is in my crontab, set to 
> run at 3:00AM).
>
> The filter-media command does not give any errors, and reports the 
> following output:
> The script has started
> File: TESTE-DSPACE-FUNDUNESP.docx.txt
> FILTERED: bitstream 58d788ce-b373-49bf-bb38-c690486ce3d2 (item: 
> 123456789/3) and created 'TESTE-DSPACE-FUNDUNESP.docx.txt'
> File: D177AF23-EF76-4765-A3EE-E6F91E2C8E3A.pdf.txt
> FILTERED: bitstream ed827acc-4974-4be2-a070-bd8d05d10bc0 (item: 
> 123456789/4) and created 'D177AF23-EF76-4765-A3EE-E6F91E2C8E3A.pdf.txt'
> File: D177AF23-EF76-4765-A3EE-E6F91E2C8E3A.pdf.jpg
> FILTERED: bitstream ed827acc-4974-4be2-a070-bd8d05d10bc0 (item: 
> 123456789/4) and created 'D177AF23-EF76-4765-A3EE-E6F91E2C8E3A.pdf.jpg'
> The script has completed
>
>
>
> In the dspace log, I've found the following output:
> 2024-11-19 09:07:02,292 INFO  unknown unknown 
> org.dspace.content.ItemServiceImpl @ 
> anonymous::update_item:item_id=114257ff-e066-494c-9166-de0ed9a56849
> 2024-11-19 09:07:03,233 INFO  unknown unknown 
> org.dspace.discovery.SolrServiceImpl @ 
> anonymous::indexed_object:Item-114257ff-e066-494c-9166-de0ed9a56849
> 2024-11-19 09:07:04,478 INFO  unknown unknown 
> org.dspace.scripts.handler.impl.CommandLineDSpaceRunnableHandler @ File: 
> D177AF23-EF76-4765-A3EE-E6F91E2C8E3A.pdf.txt
> 2024-11-19 09:07:07,502 WARN  unknown unknown 
> org.apache.fontbox.ttf.PostScriptTable @ No PostScript name data is 
> provided for the font TimesNewRomanPS-ItalicMT
> 2024-11-19 09:07:07,519 WARN  unknown unknown 
> org.apache.fontbox.ttf.PostScriptTable @ No PostScript name data is 
> provided for the font CourierNewPSMT
> 2024-11-19 09:07:07,532 WARN  unknown unknown 
> org.apache.fontbox.ttf.PostScriptTable @ No PostScript name data is 
> provided for the font TimesNewRomanPSMT
> 2024-11-19 09:07:07,538 WARN  unknown unknown 
> org.apache.fontbox.ttf.PostScriptTable @ No PostScript name data is 
> provided for the font TimesNewRomanPS-BoldMT
> 2024-11-19 09:07:07,557 WARN  unknown unknown 
> org.apache.fontbox.ttf.PostScriptTable @ No PostScript name data is 
> provided for the font ArialMT
> 2024-11-19 09:07:07,643 WARN  unknown unknown 
> org.apache.fontbox.ttf.PostScriptTable @ No PostScript name data is 
> provided for the font Arial-BoldMT
> 2024-11-19 09:07:07,703 WARN  unknown unknown 
> org.apache.fontbox.ttf.PostScriptTable @ No PostScript name data is 
> provided for the font CourierNewPS-BoldMT
> 2024-11-19 09:07:07,814 WARN  unknown unknown 
> org.apache.fontbox.ttf.PostScriptTable @ No PostScript name data is 
> provided for the font Arial-BoldItalicMT
> 2024-11-19 09:07:07,885 WARN  unknown unknown 
> org.apache.fontbox.ttf.PostScriptTable @ No PostScript name data is 
> provided for the font Arial-ItalicMT
> 2024-11-19 09:07:07,920 WARN  unknown unknown 
> org.apache.fontbox.ttf.PostScriptTable @ No PostScript name data is 
> provided for the font TimesNewRomanPS-BoldItalicMT
> 2024-11-19 09:07:10,235 INFO  unknown unknown 
> org.dspace.content.BitstreamServiceImpl @ 
>
> anonymous::update_bitstream:bitstream_id=fc0f3b71-28b4-4d64-891c-0b359f767df7
> 2024-11-19 09:07:10,235 INFO  unknown unknown 
> org.dspace.content.BitstreamServiceImpl @ 
>
> anonymous::create_bitstream:bitstream_id=fc0f3b71-28b4-4d64-891c-0b359f767df7
> 2024-11-19 09:07:10,374 INFO  unknown unknown org.dspace.content.Bundle 
> @ 
>
> anonymous::add_bitstream:bundle_id=7956cda0-eac0-4c76-9f4b-cb84020b8677,bitstream_id=fc0f3b71-28b4-4d64-891c-0b359f767df7
> 2024-11-19 09:07:10,377 INFO  unknown unknown 
> org.dspace.content.ItemServiceImpl @ 
> anonymous::update_item:item_id=a33076d7-92b2-4710-a4d0-9b0c7dd0f1ee
> 2024-11-19 09:07:10,390 INFO  unknown unknown 
> org.dspace.content.ItemServiceImpl @ 
> anonymous::update_item:item_id=a33076d7-92b2-4710-a4d0-9b0c7dd0f1ee
> 2024-11-19 09:07:10,436 INFO  unknown unknown 
> org.dspace.content.BitstreamServiceImpl @ 
>
> anonymous::update_bitstream:bitstream_id=fc0f3b71-28b4-4d64-891c-0b359f767df7
> 2024-11-19 09:07:10,438 INFO  unknown unknown 
> org.dspace.content.MetadataValueServiceImpl @ 
> anonymous::add_metadatavalue:metadata_value_id=167
> 2024-11-19 09:07:10,440 INFO  unknown unknown 
> org.dspace.content.MetadataValueServiceImpl @ 
> anonymous::add_metadatavalue:metadata_value_id=168
> 2024-11-19 09:07:10,441 INFO  unknown unknown 
> org.dspace.content.MetadataValueServiceImpl @ 
> anonymous::add_metadatavalue:metadata_value_id=169
> 2024-11-19 09:07:10,445 INFO  unknown unknown 
> org.dspace.content.BitstreamServiceImpl @ 
>
> anonymous::update_bitstream:bitstream_id=fc0f3b71-28b4-4d64-891c-0b359f767df7
> 2024-11-19 09:07:10,464 INFO  unknown unknown org.dspace.content.Bundle 
> @ 
>
> anonymous::remove_bitstream:bundle_id=7956cda0-eac0-4c76-9f4b-cb84020b8677,bitstream_id=3078e743-5bdd-4b10-a024-c870da19e005
> 2024-11-19 09:07:10,464 INFO  unknown unknown 
> org.dspace.content.ItemServiceImpl @ 
> anonymous::update_item:item_id=a33076d7-92b2-4710-a4d0-9b0c7dd0f1ee
> 2024-11-19 09:07:10,465 INFO  unknown unknown 
> org.dspace.content.BitstreamServiceImpl @ 
>
> anonymous::update_bitstream:bitstream_id=fc0f3b71-28b4-4d64-891c-0b359f767df7
> 2024-11-19 09:07:10,465 INFO  unknown unknown 
> org.dspace.content.ItemServiceImpl @ 
> anonymous::update_item:item_id=a33076d7-92b2-4710-a4d0-9b0c7dd0f1ee
> 2024-11-19 09:07:10,468 INFO  unknown unknown 
> org.dspace.content.BitstreamServiceImpl @ 
>
> anonymous::delete_bitstream:bitstream_id=3078e743-5bdd-4b10-a024-c870da19e005
> 2024-11-19 09:07:10,469 INFO  unknown unknown 
> org.dspace.content.BitstreamServiceImpl @ 
>
> anonymous::update_bitstream:bitstream_id=3078e743-5bdd-4b10-a024-c870da19e005
> 2024-11-19 09:07:10,482 INFO  unknown unknown 
> org.dspace.scripts.handler.impl.CommandLineDSpaceRunnableHandler @ 
> FILTERED: bitstream ed827acc-4974-4be2-a070-bd8d05d10bc0 (item: 
> 123456789/4) and created 'D177AF23-EF76-4765-A3EE-E6F91E2C8E3A.pdf.txt'
> 2024-11-19 09:07:10,483 INFO  unknown unknown 
> org.dspace.content.ItemServiceImpl @ 
> anonymous::update_item:item_id=a33076d7-92b2-4710-a4d0-9b0c7dd0f1ee
>
>
> Still, when I enter the DSpace UI and try to search in the community or 
> in the collection, the search only shows results from the manually 
> inserted tags, nothing from inside the PDF. My conclusion, is that 
> obviously I'm letting something escape. Another thing that indicates 
> that I'm doing something wrong, is that for both files the 
> thumbnail/miniature is not being generated.
>
> Could somebody help me find where my mistake is?
>
> Thank you,
>
> Roberto
>
>
>
> -- 
> -----------------------------------------------------
> Marcos Roberto Greiner
>
> Os otimistas acham que estamos no melhor dos mundos
> Os pessimistas tem medo de que isto seja verdade
> James Branch Cabell
> -----------------------------------------------------
>
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/dspace-tech/6c0222cf-c988-4fc8-8906-6fa40dad51f2n%40googlegroups.com.

Reply via email to