Hello everyone, I'm currently trying to plan an implementation of this and wanted to ask the opinion of the developers on how to go about it.
I have seen the great resources provided by Peter Dietz@LongSight regarding the integration of BookReader <https://github.com/internetarchive/bookreader> in DSpace such as the video demo <https://www.youtube.com/watch?v=mZkvfxPrwZw> and the source code <https://github.com/peterdietz/DSpace/tree/bookreader/dspace-xmlui/src/main/webapp/themes/wheaton-mirage2/vendor/BookReader> . And all works great, provided that the bitstreams contained in the item follow a specific nomenclature (001.jpg, 002.jpg, 003.jpg, etc) so that the client app can request/render them in the correct order and request page ranges, etc. However, the feature of searching within the document itself is disabled, because - I believe - this particular feature needs a backend to supply the client app with the needed information. This can be seen in production in archive.org or with a specific example of searching the term *Socrates* within a book <https://archive.org/stream/in.ernet.dli.2015.50197/2015.50197.Plato#page/n105/mode/2up/search/Socrates> . The backend from internet archives' BookReader returns a JSON entry for every hit, example: { "text": "fly towards him, nestle in his breast, and then spread its wings and soai upwards, singing most sweetly The next morning Ariston appeared, leading his son Plato to the philosopher, and {{{Socrates}}} knew that his dieam was fulfilled", "par": [ { "boxes": [ { "r": 694, "b": 412, "t": 358, "page": 10, "l": 531 } ], "b": 463, "t": 172, "page_width": 1243, "r": 1146, "l": 28, "page_height": 2123, "page": 10 } ] } This makes sense because with this info the client app can *1)* correctly pinpoint the specific pages where the term is found and *2)* correctly render the highlight box around the searched term within the page being presented using the 'coordinates' and dimensions. *Assuming:* 1) Have all the required bitstreams in jpeg format and in the correct naming convention mentioned above; 2) Have the required word location information in ALTO.xml files (DSpace wouldn't generate that info, need only to process/serve it). *How would one have DSpace act as a backend for the BookReader client app?* The best theorycrafting I've come up with thus far is to build a custom media-filter that would interpret the word information contained in the ALTO.xml files for each item, and store this information in a new custom SOLR index, that would afterwards be queried by the client app. Every item would have their own word index with information for each word (page, width, height, vpos, hpos), this means this particular index would have to be repeated for every word and serve only the *hits* to the client app. For example, the following query: <DSpaceURL>/solr/search/select?q=search.resourceid:<itemID>&word.value=<searchTerm> Would return the information for all the occurrences of the *word* index with the value <searchTerm> (above ex: Socrates). IF this would be accomplished, in theory, it would work. Has anyone got other idea on this? Or implemented something similar before? Or thought about it before? Sorry for the wall of text. Thanks as always, Pedro Amorim -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To post to this group, send email to dspace-tech@googlegroups.com. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.