[
https://issues.apache.org/jira/browse/LUCENE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559887#action_12559887
]
Michael McCandless commented on LUCENE-1084:
--------------------------------------------
{quote}
This kind of limit is common on web search engines. It prevents really big
pages that crawlers find causing indexing and search from blowing up (think a
100MB PDF that claims it is a text file). So changing it might indeed hurt
folks who're indexing uncontrolled web content.
{quote}
OK, it seems like it's an important safeguard, and risky to change, so
let's wait for 3.0.
Maybe we could increase it from 10K --> 100K to reduce the times when
a legit document is truncated?
{quote}
An alternative to changing the default setting would be to not have a default -
make it a required parameter to the IndexWriter constructor. That way, there is
no silent loss (or gain) of content - the user must specify.
{quote}
I think this is a good idea; it basically forces the user to confront
the truncation issue up front.
> increase default maxFieldLength?
> --------------------------------
>
> Key: LUCENE-1084
> URL: https://issues.apache.org/jira/browse/LUCENE-1084
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: 2.2
> Reporter: Daniel Naber
> Assignee: Michael McCandless
> Fix For: 2.4
>
>
> To my understanding, Lucene 2.3 will easily index large documents. So
> shouldn't we get rid of the 10,000 default limit for the field length? 10,000
> isn't that much and as Lucene doesn't have any error logging by default, this
> is a common problem for users that is difficult to debug if you don't know
> where to look.
> A better new default might be Integer.MAX_VALUE.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]