Hi, I would start from ManifoldCF - it may save you some work.
Otis Solr & ElasticSearch Support http://sematext.com/ On Jun 26, 2013 5:01 PM, "lukasw" <[email protected]> wrote: > Hello > > I'll try to briefly describe my problem and task. > My name is Lukas and i am Java developer , my task is to create search > engine for different types of file (only text file types) pdf, word, odf, > xml but not html. > I have got little experience with lucene about year ago i wrote simple full > text search using lucene and hibernate search. That was simple project. But > now i have got very difficult task with searching. > We are using java 1.7 and glassfish 3 and i have to concentrate only server > side approach not client ui. Ther is my three major problem : > > 1) All files is stored on webdav server, but information about file name , > id file typ etc are stored into database (postgresql) so when i creating > index i need to use both information. As a result of query i need only > return file id from database. Summary content of file is stored in server > but information about file is stored in database so we must retrieve both. > > 2) Secondary problem it that each file has a level of secrecy. But major > problem is that this level is calculated dynamically. When calculating > level > of security for file we considering several properties. The static > properties is files location, the folder in which the file is, but also > dynamic information user profiles user roles and departments . So when > user "Maggie" is logged she can search only files "test.pdf" , "test2.doc" > etc but if user "Stev" is logged he have got different profiles such a > Maggie so he can only search some phase in file "broken.pdf", "mybook.odt". > test2.doc etc ..... . I think that when for example user search phase > "lucene +solr" we search in all indexed documents and after that filtered > result. But i think that solution is is not very efficient. What if > results > count 100 files , so what next we filtered step by step each files ? But i > do not see any other solution. Maybe you can help me and lucene or solr > have > got mechanism to help. > > 3) Last problem is that some files are encrypted. So that files must be > indexed only once before encryption ! But i think that if we indexed secure > files so we get security issue. Because all word from that file is > tokenized. > I have not got any idea haw to secure lucene documents and index datastore > ? > its possible ... > > > Also i have got question that i need to use Solr for my serarch engine or > using only lucene and write own search engine ? So as you can see i have > not > got problem with indexing , serching but with security files and files > secured levels. > > Thanks for any hints and time you spend for me. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
