Hi Shai, Thank you very much, I have succeeded with Solr to index and run.
But actually, I expected that I can import Lucene as a library (I am not Java expert, more familiar with C/C++) and call some Lucene functions. Could you give me a URL tutorial for Lucene 4 which is useful for Java newbie? ---------------------------------- Thanks and Best Regards Vinh Dang (Msc.) Project Manager FPT Software Mobile: +84 982 058 956 Skype: dqvinh87 Y!M: dqvinh87 Email: dqvin...@gmail.com Websites: http://www.vinhdq.blogspot.com On Tue, Jul 9, 2013 at 7:26 AM, Shai Erera <ser...@gmail.com> wrote: > Well ... at a high level, this is what you should do: > > > 1. Integrate with Apache Tika for parsing the .DOC files (and maybe > other office files you have) > 2. Tika extracts the contents of the document, as well as some metadata > 3. Create a Lucene Document object to which you add Fields: > 1. TextField for e.g. the "content" field > 2. StringField for e.g. the path to the document on the file system > 3. NumericDocValuesField for e.g. the documents modification date > 4. Perhaps another StringField for the documents type (Word, > PowerPoint) > 4. Index these documents with IndexWriter > 5. Search using IndexSearcher > > I'm sure there's a lot of Lucene tutorials around, for example: > http://www.lucenetutorial.com/lucene-in-5-minutes.html. Covers pretty much > what I've mentioned above. > > From there, you can expand to add search results highlighting (summaries / > snippets) using e.g. PostingsHighlighter, faceted search using Lucene > facets, Spelling correction and more. > > Also, are you aware of Solr, which is a search engine developed on top of > Lucene. It takes care of all that for you, and has some pretty good > tutorials and documentation. > If you're not aiming to do something very challenging with these documents, > I think Solr can help you set up search very quickly, without writing any > code. > > Shai > > > On Tue, Jul 9, 2013 at 2:44 AM, Vinh Dang <dqvin...@gmail.com> wrote: > > > Sorry for my typo, > > > > I mean Lucene 4.3.1, > > > > Thank Beale from US for that :) > > > > --- > > Best Regards > > Vinh Dang > > dqvin...@gmail.com > > > > > > > > > > On Jul 8, 2013, at 9:46 PM, Vinh Dang <dqvin...@gmail.com> wrote: > > > > > Hi everyone, > > > > > > I am very new in Lucene, so please forgive me if my question is quite > > stupid. > > > > > > I spent a whole day to google how to start with Lucene 4.6.1, but > > failed. I found some clear tutorials, but they were written for too old > > Lucene versions (almost 2). > > > > > > My tasks are: > > > I have a folder which contains multiple .DOC files, with Unicode > > characters (actually, they are Vietnamese characters). > > > I want to index this folder with Lucene (4.6.1 is the best, but another > > versions is OK). > > > > > > Could you give a point to start? > > > > > > Thank you very much, > > > > > > --- > > > Best Regards > > > Vinh Dang > > > dqvin...@gmail.com > > > > > > > > > > > > > > > > >