To be honest I am not familiar with ManifoldCF, so I won't say if Hibernate Search is better or not, but it would definitely not be too hard with Hibernate Search:
1) You annotate with @Indexed the entity referring to your PostgreSQL table containing the metadata; with @TikaBridge you point it to the external resource containing the document. Returning database ids is the default behaviour. http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#d0e4244 2) Is a bit more complex but I don't think any more complex than what it would be with other technologies: you should encode some information in the index, then define a parametric filter on that. http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#query-filter 3) Not sure, sorry. But the automatic indexing triggers happen as soon as you store the metadata, so maybe that is good enough? Looks interesting! Sanne - Hibernate Search team On 27 June 2013 03:14, Otis Gospodnetic <[email protected]> wrote: > Hi, > > I would start from ManifoldCF - it may save you some work. > > Otis > Solr & ElasticSearch Support > http://sematext.com/ > > On Jun 26, 2013 5:01 PM, "lukasw" <[email protected]> wrote: >> >> Hello >> >> I'll try to briefly describe my problem and task. >> My name is Lukas and i am Java developer , my task is to create search >> engine for different types of file (only text file types) pdf, word, odf, >> xml but not html. >> I have got little experience with lucene about year ago i wrote simple >> full >> text search using lucene and hibernate search. That was simple project. >> But >> now i have got very difficult task with searching. >> We are using java 1.7 and glassfish 3 and i have to concentrate only >> server >> side approach not client ui. Ther is my three major problem : >> >> 1) All files is stored on webdav server, but information about file name , >> id file typ etc are stored into database (postgresql) so when i creating >> index i need to use both information. As a result of query i need only >> return file id from database. Summary content of file is stored in server >> but information about file is stored in database so we must retrieve both. >> >> 2) Secondary problem it that each file has a level of secrecy. But major >> problem is that this level is calculated dynamically. When calculating >> level >> of security for file we considering several properties. The static >> properties is files location, the folder in which the file is, but also >> dynamic information user profiles user roles and departments . So when >> user "Maggie" is logged she can search only files "test.pdf" , "test2.doc" >> etc but if user "Stev" is logged he have got different profiles such a >> Maggie so he can only search some phase in file "broken.pdf", >> "mybook.odt". >> test2.doc etc ..... . I think that when for example user search phase >> "lucene +solr" we search in all indexed documents and after that filtered >> result. But i think that solution is is not very efficient. What if >> results >> count 100 files , so what next we filtered step by step each files ? But >> i >> do not see any other solution. Maybe you can help me and lucene or solr >> have >> got mechanism to help. >> >> 3) Last problem is that some files are encrypted. So that files must be >> indexed only once before encryption ! But i think that if we indexed >> secure >> files so we get security issue. Because all word from that file is >> tokenized. >> I have not got any idea haw to secure lucene documents and index datastore >> ? >> its possible ... >> >> >> Also i have got question that i need to use Solr for my serarch engine or >> using only lucene and write own search engine ? So as you can see i have >> not >> got problem with indexing , serching but with security files and files >> secured levels. >> >> Thanks for any hints and time you spend for me. >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html >> Sent from the Lucene - Java Developer mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
