I would like to use the data stored in the Lucene indexes, like the words and
their frequencies and store them in a database. Can anyone suggest a way of
going about it or is it possible at all?
TIA
Prasanna
--
View this message in context:
Take a look at TermDocs and TermEnum.
-Grant
On Dec 13, 2006, at 6:02 AM, Venkateshprasanna wrote:
I would like to use the data stored in the Lucene indexes, like the
words and
their frequencies and store them in a database. Can anyone suggest
a way of
going about it or is it possible
Hello All,
Apolgies if it is a naive question
a) Indexing large file ( more than 4MB )
Do i need to read the entire file as string using
java.io and create a Document object ?
The file contains timestamp, if i need to index on
timestamp is parsing the entire file manually
Let me take a crack at it. See below...
On 12/13/06, abdul aleem [EMAIL PROTECTED] wrote:
Hello All,
Apolgies if it is a naive question
a) Indexing large file ( more than 4MB )
Do i need to read the entire file as string using
java.io and create a Document object ?
Essentially yes.
Many thanks Erick,
Your points are valid, i was thinking entire Log file
as a lucene document, im wrong trying to chop the log
file might be the way to go
my bad expressions , yes you got that right
timestamp must be added as a FIELD that is what i
meant
really appreciate your detailed reply,
Do you know about any papers that discuss this?
Karl
Original-Nachricht
Datum: Wed, 13 Dec 2006 10:31:41 -0500
Von: Yonik Seeley [EMAIL PROTECTED]
An: java-user@lucene.apache.org
Betreff: Re: Lucene scoring: coord_q_d factor
On 12/13/06, Karl Koch [EMAIL PROTECTED] wrote:
One other thing I discovered that I mention so no one else is tripped up
by it.
I set the boost to zero for the categories in the query. When I ran my
unit tests, some of them started to fail. I eventually realized that
the failures were in searches where I only wanted to find documents in
Hi,
first let me explain the situation:
We have to index an document, which contains a field file to store
filenames.
Sometimes filenames contain an underscore or an minus (_ or -). =
e.g. foo_bar.doc
Indexing is'nt the problem so far.
But if we now try to search for foo_b* the
Hi
I have a problem:
i must create a matrix term for document in which every element of the
matrix it represents the number of occurrences of that term in the document.
How can I do?
Can someone help me?
Thanks to all
P.S. I must applicate LSA to this matrix.
--
View this message in
I recognize that error message ;) You're using AnalyzingQueryParser
http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html -
yes?
These are imo the two most obvious options:
1. Revert to standard QueryParser - it won't analyze prefix- and
I would suggest you take a look at exist-db (http://exist-db.org/).
A database for XML documents that support XQuery.
We are using both products here (lucene and exist-db), and for what you are
looking for, exist-db seems better.
Our documents are far more complex than yours (about 500
Hi Mark:
For 10 million records We recommend an strong database such as Oracle.
You can annotate the Schema (.xsd) which describes your XML record
to store some field in traditional VARCHAR2 or NUMBER columns to query
it faster, and DRECONTENT in a CLOB column.
You can find more information
Lucene RangeQuery would do for the time and numeric reqs.
Mark Mei [EMAIL PROTECTED] wrote:
At the bottom of this email is the sample xml file that we are using
today.
We have about 10 million of these.
We need to know whether Lucene can support the following functionalities.
(1) Each field
On Wednesday 13 December 2006 14:10, abdul aleem wrote:
a) Indexing large file ( more than 4MB )
Do i need to read the entire file as string using
java.io and create a Document object ?
You can also use a reader:
You are right. Database usually is in 3NF, while lucene usually works
on an array of objects. Different database has different data model.
There are quite some efforts to crawl database, create the lucene
index, keep it in sync with the database, and rendering the search
results. If data model
On Wednesday 13 December 2006 16:42, Karl Koch wrote:
Do you know about any papers that discuss this?
Coordination is called co-ordination In the original idf paper by
K. Spärck Jones, A statistical interpretation of term specificity
and its application in retrieval., Journal of Documentation
: For 10 million records We recommend an strong database such as Oracle.
eh ... who is We in that statement?
I Suspect you'll find other people on this list who have no problems
running Lucene indexes containing 10 million documents.
If you want a database, then by all means use a database,
Hi Chris:
On 12/13/06, Chris Hostetter [EMAIL PROTECTED] wrote:
: For 10 million records We recommend an strong database such as Oracle.
eh ... who is We in that statement?
We are independent consultants working for many years with Oracle databases ;)
I Suspect you'll find other people
Hi,
Is anyone index an excel file before? I took a look at the API classes
provided by POI HSSF, however, I did not find any method to extract the text
from excel file and index them.
Please assist and leet me know where I can find the example to refer to.
Thanks
regards,
Wooi Meng
--
I think the last structure is good. The index should be structured
according to how you want to search it. If your needs changed, you
should simply have another index. One index for all is not really
good. Index is more of trading space for time, so duplication is not
really a concern.
The first
As you may have already heard, IBM and Yahoo! today released a new
product named IBM OmniFind Yahoo!
Editionhttp://omnifind.ibm.yahoo.net/productinfo.php.
It is a free-of-charge
search engine for web sites and file systems, which builds on Lucene
and other components such as UIMA
21 matches
Mail list logo