The problem you may face that for such large documents,is that there
is a high probability that most of terms will be present in all
documents.
So on search you'll receive a lot of documents (if you need to
retrieve full text, it will take a while), but the bigger problem is
usability: what a
* mark harwood:
Could you get a heap dump (eg with YourKit) of what's using up all the
memory when you hit OOM?
On this particular machine I have a JRE, no admin rights and
therefore limited profiling capability :(
Maybe this could give you a heap dump which you can analyze on a
different
Thanks, I have a heap dump now from a run with reduced JVM memory (in order to
speed up a failure point) and am working through it offline with VisualVm.
This test induced a proper OOM as opposed to one of those timed out waiting
for GC type OOMs so may be misleading.
The main culprit in this
mark harwood wrote:
Thanks, I have a heap dump now from a run with reduced JVM memory
(in order to speed up a failure point) and am working through it
offline with VisualVm.
This test induced a proper OOM as opposed to one of those timed out
waiting for GC type OOMs so may be
Michael McCandless wrote:
Ie, it's still not clear if you are running out of memory vs hitting
some weird it's too hard for GC to deal kind of massive heap
fragmentation situation or something. It reminds me of the special
(I cannot be played on record player X) record (your application)
Well, PerFieldAnalyzerWrapper is just a bunch of Analyzers,independent of
queries. See the API, but in general
PerFieldAnalyzerWrapper perf = new PerFieldAnalyzerWrapper(default, new
StandardAnalyzer());
perf.add(untokenized, new WhitespaceAnalyzer());
perf.add(tokenized, new SnowballAnalyzer());
Mark Miller wrote:
Michael McCandless wrote:
Ie, it's still not clear if you are running out of memory vs
hitting some weird it's too hard for GC to deal kind of massive
heap fragmentation situation or something. It reminds me of the
special (I cannot be played on record player X)
OK, it's early days and I'm holding my breath but I'm currently progressing
further through my content without an OOM just by using a different GC setting.
Thanks to advice here and colleagues at work I've gone with a GC setting of
-XX:+UseSerialGC for this indexing task.
The rationale that
Hi all,
Can any one explain How function integer2String works.
public static int int2sortableStr(int val, char[] out, int offset) {
val += Integer.MIN_VALUE;
out[offset++] = (char)(val 24);
out[offset++] = (char)((val 12) 0x0fff);
out[offset++] = (char)(val 0x0fff);
On Wed, Mar 11, 2009 at 9:54 AM, Allahbaksh Mohammedali Asadullah
allahbaksh_asadul...@infosys.com wrote:
Hi all,
Can any one explain How function integer2String works.
public static int int2sortableStr(int val, char[] out, int offset) {
val += Integer.MIN_VALUE;
This maps MIN_VALUE to
We are having a problem running searches on an index after upgrading to
2.4 and using the new Field.setOmitTf() function. The index size has
been dramatically reduces and even the search performace is better. But
searches do not return any results if searching for something that has a
space
Hi,
I didn't get what exactly does shifiting 24 times and shifing 12 times does. Is
there any Character at that value or is there some differenciator?
Can some one go in bit details.
Regards.
-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
On Wed, Mar 11, 2009 at 10:25 AM, Allahbaksh Mohammedali Asadullah
allahbaksh_asadul...@infosys.com wrote:
Hi,
I didn't get what exactly does shifiting 24 times and shifing 12 times does.
Is there any Character at that value or is there some differenciator?
Can some one go in bit details.
Ganesh wrote:
Mike in of his replies to the thread Faceted search using Lucene,
gave the following code review comment
* You are creating a new Analyzer QueryParser every time, also
creating unnecessary garbage; instead, they should be created once
reused.
This made me to ask the below
Allahbaksh Mohammedali Asadullah wrote:
For example I want to search amount = 15 rather than doing it
amount:[ 15] or something?
Is there any open source queryparser which converts something like
amount =15 into lucene number format query.
I don't know of any effort to change Lucene's
Hmmm - you can probably get qsol to do it: http://myhardshadow.com/qsol.
I think you can setup any token to expand to anything with a regex
matcher and use group capturing in the replacement (I don't fully
remember though, been a while since I've used it).
So you could do a regex of something
Yonik Seeley wrote:
On Mon, Mar 9, 2009 at 2:02 PM, Michael McCandless
luc...@mikemccandless.com wrote:
Once added, something inside the index (a write once schema)
records
that this field is an IntField and then it's an error to ever use a
different type field by that same name.
I
Hi Lucene professionals!
This may sound like a dumb beginner's question, but anyways: Can Lucene
run out of memory during indexing?
Should I use IndexWriter.flush() or .commit(), and if so, how often?
Thank you for your support.
Niels
--
Niels Ott
Computational Linguist (B.A.)
If you can supply a Junit test that recreates the problem I think we can
start to make progress on this.
Amin Mohammed-Coleman wrote:
Hi
Apologies for re sending this mail. Just wondering if anyone has
experienced the below. I'm not sure if this could happen due nature of
document. It
Hi Niels,
See the javadocs for IndexWriter.setRAMBufferSizeMB()
Cheers
Mark
Niels Ott wrote:
Hi Lucene professionals!
This may sound like a dumb beginner's question, but anyways: Can
Lucene run out of memory during indexing?
Should I use IndexWriter.flush() or .commit(), and if so, how
On Wed, Mar 11, 2009 at 2:35 PM, Michael McCandless
luc...@mikemccandless.com wrote:
This is expected: phrase searches will not work when you omitTf.
But why would a phrase query be created? The code given looks like it
should create a boolean query with two terms.
Of course, the given code
Yonik Seeley wrote:
On Wed, Mar 11, 2009 at 2:35 PM, Michael McCandless
luc...@mikemccandless.com wrote:
This is expected: phrase searches will not work when you omitTf.
But why would a phrase query be created? The code given looks like it
should create a boolean query with two
Siraj Haider wrote:
Yonik Seeley wrote:
On Wed, Mar 11, 2009 at 2:35 PM, Michael McCandless
luc...@mikemccandless.com wrote:
This is expected: phrase searches will not work when you omitTf.
But why would a phrase query be created? The code given looks like
it
should create a boolean
: For a 'SpanNearQuery', this reduces the effect of the term frequency on the
: score as the number of terms in the span increases. So, for a simple phrase
: query (using spans), the longer the phrase, the lower the TF. For a simple
: SpanTermQuery, the TF is reduced in half (1.0f / 1 + 1).
:
:
: For a SpanNearQuery that contains SpanTermQueries, the score for a match on
: the quick brown fox would be lower than a match on brown fox because of
: the edit distance (4 vs 2). This seems counter intuitive, too.
you have to clarify what you mean ...
if you're talking about a SpanNearQuery
: Subject: index large size file
: In-Reply-To: 49b5fc5e.10...@r.email.ne.jp
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if
I suppose SpanTermQuery could override the weight/scorer methods so that
it behaved more like a TermQuery if it was executed directly ... but
that's really not what it's intended for.
This is currently the only way to boost a term via payloads.
BoostingTermQuery extends SpanTermQuery.
if
Hi Mark,
markharw00d schrieb:
Hi Niels,
See the javadocs for IndexWriter.setRAMBufferSizeMB()
I tried different settings. Apart from the fact that my memory issue
seems to by my own fault, I'm wondering what Lucene does in the
background. Apparently it does flush(), but not commit()?
At
Hi,
What do you mean untokenized field?
Are you using different analyzer for different field? If yes, I think
you just use the same analyzer (PerfieldAnalyzer, I guess) for query.
Li
-Original Message-
From: rokham [mailto:somebodyik...@gmail.com]
Sent: Monday, March 09, 2009 11:02 PM
29 matches
Mail list logo