Hi,

I would start from ManifoldCF - it may save you some work.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jun 26, 2013 5:01 PM, "lukasw" <[email protected]> wrote:

> Hello
>
> I'll try to briefly describe my problem and task.
> My name is Lukas and i am Java developer , my task is to create search
> engine for different types of file (only text file types) pdf, word, odf,
> xml but not html.
> I have got little experience with lucene about year ago i wrote simple full
> text search using lucene and hibernate search. That was simple project. But
> now i have got very difficult task with searching.
> We are using java 1.7 and glassfish 3 and i have to concentrate only server
> side approach not client ui. Ther is my three major problem :
>
> 1) All files is stored on webdav server, but information about file name ,
> id file typ etc are stored into database (postgresql) so when i creating
> index i need to use both information. As a result of query i need only
> return file id from database. Summary content of file is stored in server
> but information about file is stored in database so we must retrieve both.
>
> 2) Secondary problem it that  each file has a level of secrecy. But major
> problem is that this level is calculated dynamically. When calculating
> level
> of security for file we considering several properties. The static
> properties is files location, the folder in which the file is, but also
> dynamic  information  user profiles user roles and departments . So when
> user "Maggie" is logged she can search only files "test.pdf" , "test2.doc"
> etc but if user "Stev" is logged he have got different profiles such a
> Maggie so he can only search some phase in file "broken.pdf", "mybook.odt".
> test2.doc etc ..... . I think that when for example user search phase
> "lucene +solr" we search in all indexed documents and after that filtered
> result. But i think that solution is  is not very efficient. What if
> results
> count 100 files , so what next we filtered step by step each files  ? But i
> do not see any other solution. Maybe you can help me and lucene or solr
> have
> got mechanism to help.
>
> 3) Last problem is that some files are encrypted. So that files must be
> indexed only once before encryption ! But i think that if we indexed secure
> files so we get security issue. Because all word from that file is
> tokenized.
> I have not got any idea haw to secure lucene documents and index datastore
> ?
> its possible ...
>
>
> Also i have got question that i need to use Solr for my serarch engine or
> using only lucene and write own search engine ? So as you can see i have
> not
> got problem with indexing , serching but with security files and files
> secured levels.
>
> Thanks for any hints and time you spend for me.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to