在google里搜一下中文分词,出车东的包外,应该还有很多了,如果你发现有更好分词,更高效率的,也推荐一份啊。
--------------------------------------------------
From: "kai.hu" <[EMAIL PROTECTED]>
Sent: Sunday, May 04, 2008 4:20 PM
To: <java-user@lucene.apache.org>
Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
你只要索引并分词“下午去开会”就行了,把对应的时间保存进去。
如document.add(new Field("sub","下午去开会",Field.Store.YES,Field.Index.TOKENIZED));
document.add(new
Field("time","01:02:02",Field.Store.YES,Field.Index.UN_TOKENIZED));
到时候搜索出的单个document里就包含这两个Field了。
only index and tokenized "下午去开会",and store the time with this sub.
--------------------------------------------------
From: "Cedric Ho" <[EMAIL PROTECTED]>
Sent: Tuesday, April 22, 2008 3:36 PM
To: <java-user@lucene.apache.org>
Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
In that case you may want to index each:
Field("Sub","下午去开会","01:02:02");
as a separate document. So your document contains 3 fields
1. title
2. time
3. sub
then you can get both title and time by searching the "sub" field.
Cedric
2008/4/22 王建新 <[EMAIL PROTECTED]>:
谢谢,我只是检索sub,不检索时间,在检索sub时,只想得到匹配Field对应的时间。
用payload似乎不可以?
----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Tuesday, April 22, 2008 1:55 PM
Subject: RE: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
Try to use payload which is stored as additional information. Currently
lucene only support per token payload, but you can add an arbitrary
token for the time information.
I am not sure what are the query information? Only the subtitle or both
subtitle and time?
Regards,
-----Original Message-----
From: 王建新 [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 22, 2008 1:06 PM
To: java-user
Subject: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
用英文可能描述得不是很清楚,不好意思:)
----- Original Message -----
From: 王建新
To: Chris
Sent: Tuesday, April 22, 2008 9:52 AM
Subject: Re: Need addtional info for Field
谢谢。
我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
Field("Sub","下午去开会","01:02:02");
Field("Sub","后天去开会","01:03:05");
[注:"01:02:02"是附属的时间,lucene没有提供这种用法。]
这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕"下午去开会",在01:03:05时间出现了"后天去开会",如果用户(User)搜索"下午",当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间"01:02:02"。如果用户搜索"开会",则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
不知道我有没有说清楚。
我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?
王建新
----- Original Message -----
From: Chris
To: 王建新
Sent: Monday, April 21, 2008 7:34 PM
Subject: Re: Need addtional info for Field
您的功能可以再清楚一點嗎,因為其實這樣處理,好像要斷詞....
但看到您沒斷,而且欄位名稱一樣是 multi-pair 值的話,不是用 String 存哦
以上
Chris.
2008/4/21, 王建新 <[EMAIL PROTECTED]>:
你看得懂中文吗?
我不是很明白你的意思。
你是说可以用lucene现有的功能来解决这个问题吗?
----- Original Message -----
From: Chris
To: 王建新
Sent: Monday, April 21, 2008 5:14 PM
Subject: Re: Need addtional info for Field
This problem is not solve with lucene but or method will solve it.
The structure is not define as this as well ......
You may check it clear....
above
Chris.
2008/4/21, 王建新 <[EMAIL PROTECTED]>:
hi Chris, it is me "王建新"
I have a new problem, Could you give me any advice? Thank you.
I want to use lucene with some additional info,like:
1.index
Document additionalDoc=ew Document()
additionalDoc.add(new Field("field","AA BB","Addtional info
..............."));
additionalDoc.add(new Field("field","BB CC","Addtional info
222222222222222222222222..............."));
writer.addDocument(additionalDoc)
........
2. search
Searcher searcher;
....
searcher.search(termQuery("field","BB"));
in this condition, I want lucene returns the additionalDoc ,
also know which fileds were matched, then I will get the additional info
from the matched fields.
Can lucene make it in version 2.3.1?
--
Chris Lin
[EMAIL PROTECTED]
Taipei , Taiwan.
-----------------------------------------------------------
--
Chris Lin
[EMAIL PROTECTED]
Taipei , Taiwan.
-----------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]