Hi,
In my programme, I can index and search a document based on unigrams. I
modified the code as follows to obtain the results based on bigrams.
However, it did not give me the desired output.
*****************
*public* *static* *void* createIndex() *throws* CorruptIndexException,
LockObtainFailedException,
IOException {
*final* String[] NEW_STOP_WORDS = {"a", "able", "about",
"actually", "after", "allow", "almost", "already", "also", "although",
"always", "am", "an", "and", "any", "anybody"}; //only a portion
SnowballAnalyzer analyzer = *new* SnowballAnalyzer("English",
NEW_STOP_WORDS );
Directory directory = FSDirectory.getDirectory(*INDEX_DIRECTORY*
);
ShingleAnalyzerWrapper sw=*new*
ShingleAnalyzerWrapper(analyzer,2);
sw.setOutputUnigrams(*false*);
IndexWriter w= *new* IndexWriter(*INDEX_DIRECTORY*, analyzer,
*true*,IndexWriter.MaxFieldLength.*UNLIMITED*);
File dir = *new* File(*FILES_TO_INDEX_DIRECTORY*);
File[] files = dir.listFiles();
*for* (File file : files) {
Document doc = *new* Document();
String text="";
doc.add(*new* Field("contents",text,Field.Store.*YES*,
Field.Index.UN_TOKENIZED,Field.TermVector.*YES*));
Reader reader = *new* FileReader(file);
doc.add(*new* Field(*FIELD_CONTENTS*, reader));
w.addDocument(doc);
}
w.optimize();
w.close();
}
****************
Still the output is;
{contents: /1, assist/1, fine/1, librari/1, librarian/1, main/1, manjula/3,
name/1, sabaragamuwa/1, univers/1}
*******************
If anybody can, please help me to obtain the correct output.
Thanks,
Manjula.