The problem with your code is simple: You cannot consume a tokenstream twice 
(like an iterator), when you consume it with the System.out.println() loop it 
can no longer be consumed by the Indexer. The same happens when you add the 
same TokenStream to several Fields to index.

 

Still I don’t understand the whole problem, looks like a XY-problem:  
<http://www.perlmonks.org/index.pl?node_id=542341> 
http://www.perlmonks.org/index.pl?node_id=542341

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: [email protected]

 

From: andrzej_gadek [mailto:[email protected]] 
Sent: Sunday, July 24, 2011 9:11 PM
To: [email protected]
Subject: [update] adding field with constructor demanding tokenStream fails - 
Field(name, tokenStream, termVector) BUG

 

update ->

 

I have find out that the problem comes from those constructors:

 

new Field(name, tokenStream)
new Field(name, tokenStream, termVector)

 

how?

 

[code]

StandardAnalyzer stAnalyzer = new StandardAnalyzer(Version.LUCENE_30);


TokenStream stStream = stAnalyzer.tokenStream("analizedContent", new 
StringReader(handler.toString()));
 

TermAttribute term = stStream.addAttribute(TermAttribute.class);
System.out.println("loop 1");
while(stStream.incrementToken()){


System.out.print(term.term() + ":");
}
System.out.println();

stStream.reset();


Field field = new Field("analizedContent", stStream, TermVector.YES);
field.setTokenStream(stStream);


TokenStream tSV = field.tokenStreamValue();
TermAttribute term2 = stStream.addAttribute(TermAttribute.class);

 

System.out.println("loop 2");
while(tSV.incrementToken()){
System.out.print(term2.term() + ":");
}
System.out.println();


System.out.println ("field.readerValue(): " + field.toString());
System.out.println ("field.readerValue(): " + field.readerValue());
System.out.println ("fieldfield.stringValue(): " + field.stringValue());

[/code]

 

and now what a get in console:

[example]


loop 1
welcome:q&a:professional:enthusiast:programmers:check:out:faq:stack:exchange:log:careers:chat:meta:about:faq:stack:overflow:questions:tags:users:badges:unanswered:ask:question:what:difference:between:getpath:getabsolutepath:getca...
 /*and many more ;-)*/


field.readerValue(): indexed,tokenized,termVector<analizedContent:>
loop 2

field.readerValue(): null
fieldfield.stringValue(): null
[/example]

 

my comment: after creation of new Field we lose a value of posted tokenStream. 
So problem o curse when U want to index pre-analized text, for example with 
different then default analyzer (in my case polish one)  

 

my conclusion: something is not working wright!

Probably nobody will help me so i'm going to find some alternate way to do 
this. For example will make String form analyzed text and then use default 
analyzer to parse it. 

 

Andrew

 

Reply via email to