What Jan said.

If you are getting this error on a _text_ based field, then your data
is bad. What it's telling you is that _after_ tokenization, you have a
single _term_ that's > 32K which is almost, but not quite totally,
useless.

ImagineASingleWordThatRunsOnForMoreThanThirtyTwoThousandCharactersHowWouldThatBeUsefulToEitherSearchOrReturnToTheUserWouldThisSingleWordImTypingBeUsefulAndItIsntEvenCloseToThirtyTwoThousandCharacters.....

So I'd try to find out what it is you're processing that shows you
such a large term. It's pretty easy to run Tika on a file in SolrJ,
see: https://lucidworks.com/2012/02/14/indexing-with-solrj/

There are also web sites that'll process the PDF file through Tika and
show you how it parses....

Best,
Erick

On Thu, Jan 24, 2019 at 12:57 AM Jan Høydahl <[email protected]> wrote:
>
> I cannot see why you'd want a single term of 32kb in your index anyway. Can 
> you give us examples of what these terms are and how you will search them?
> What kind of files are you indexing, could it be like bad PDFs consisting of 
> a bunch of binary garbage?
> Try adding a lengthFilterFactory to your fieldType(s). See 
> https://lucene.apache.org/solr/guide/7_6/filter-descriptions.html#length-filter
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 24. jan. 2019 kl. 06:51 skrev Kranthi Kumar K 
> <[email protected]>:
>
> Thank you Bernd Fehling for your suggested solution, I've tried the same by 
> changing the type and added multivalued to true in Schema.xml file i.e,
> change from:
>
> <field name="FileContent" type="text_general" indexed="true" stored="true" />
>
> Changed to:
>
> <field name="FileContent" type="text_general" indexed="true" stored="true" 
> multiValued="true" />
>
> After changing it also still we are unable to import the files size > 32 kb. 
> please find the solution suggested by Bernd in the below url:
>
> http://lucene.472066.n3.nabble.com/Re-Solr-Size-Limitation-upto-32-kb-limitation-td4421569.html
>
> Bernd Fehling, could you please suggest another alternative solution to 
> resolve our issue, which would help us alot?
>
> Please let me know for any questions.
>
>
> <image001.png>
>
> Thanks & Regards,
> Kranthi Kumar.K,
> Software Engineer,
> Ccube Fintech Global Services Pvt Ltd.,
> Email/Skype: [email protected],
> Mobile: +91-8978078449.
>
>
> From: Kranthi Kumar K
> Sent: Friday, January 18, 2019 4:22 PM
> To: [email protected]; [email protected]
> Cc: Ananda Babu medida <[email protected]>; Srinivasa Reddy 
> Karri <[email protected]>; Michelle Ngo 
> <[email protected]>; Ravi Vangala <[email protected]>
> Subject: RE: Solr Size Limitation upto 32 kb limitation
>
> Hi team,
>
> Thank you Erick Erickson ,Bernd Fehling , Jan Hoydahl for your suggested 
> solutions. I’ve tried the suggested one’s and still we are unable to import 
> files having            size  >32 kb, it is displaying same error.
>
> Below link has the suggested solutions. Please have a look once.
>
> http://lucene.472066.n3.nabble.com/Solr-Size-Limitation-upto-32-KB-files-td4419779.html
>
>
> As per Erick Erickson, I’ve changed the string type to Text type based and 
> still the issue occurs .
>
> I’ve changed from :
>
> <field name="FileContent" type="string_rev" indexed="true" stored="true" />
>
> Changed to:
>
> <field name="FileContent" type="text" indexed="true" stored="true" />
>
> If we do so, it is showing error in the log, please find the error in the 
> attachment.
>
> If I change to:
>
> <field name="FileContent" type="text_general" indexed="true" stored="true" />
>
> It is not showing any error , but the issue still exists.
>
>
> As per Jan Hoydahl, I have gone through the link that you have provided and 
> checked ‘requestParsers’ tag in solrconfig.xml,
>
>
> RequestParsers tag in our application is as follows:
>
> ‘<requestParsers enableRemoteStreaming="true"
>                     multipartUploadLimitInKB="2048000"
>                     formdataUploadLimitInKB="2048"
>                     addHttpRequestToContext="false"/>’
> Request parsers, which we are using and in the link you have provided are 
> similar. And still we are unable to import the files size >32 kb.
>
>
> As per Bernd Fehling, we are using Solr 4.10.2. you have mentioned as,
>
> ‘If you are trying to add larger content then you have to "chop" that
> by yourself and add it as multivalued. Can be done within a self written 
> loader. ’
>
> I’m a newbie to Solr and I didn’t get what exactly ‘self written loader’ is?
>
> Could you please provide us sample code, that helps us to go further?
>
>
>
> <image001.png>
>
> Thanks & Regards,
> Kranthi Kumar.K,
> Software Engineer,
> Ccube Fintech Global Services Pvt Ltd.,
> Email/Skype: [email protected],
> Mobile: +91-8978078449.
>
>
> From: Kranthi Kumar K <[email protected]>
> Sent: Thursday, January 17, 2019 12:43 PM
> To: [email protected]; [email protected]
> Cc: Ananda Babu medida <[email protected]>; Srinivasa Reddy 
> Karri <[email protected]>; Michelle Ngo 
> <[email protected]>
> Subject: Re: Solr Size Limitation upto 32 kb limitation
>
>
> Hi Team,
>
>
>
> Can we have any updates on the below issue? We are awaiting your reply.
>
>
>
> Thanks,
>
> Kranthi kumar.K
>
> ________________________________
> From: Kranthi Kumar K
> Sent: Friday, January 4, 2019 5:01:38 PM
> To: [email protected]
> Cc: Ananda Babu medida; Srinivasa Reddy Karri
> Subject: Solr Size Limitation upto 32 kb limitation
>
>
> Hi team,
>
>
>
> We are currently using Solr 4.2.1 version in our project and everything is 
> going well. But recently, we are facing an issue with Solr Data Import. It is 
> not importing the files with size greater than 32766 bytes (i.e, 32 kb) and 
> showing 2 exceptions:
>
>
>
> java.lang.illegalargumentexception
> org.apache.lucene.util.bytesref hash$maxbyteslengthexceededexception
>
>
>
> Please find the attached screenshot for reference.
>
>
>
> We have searched for solutions in many forums and didn’t find the exact 
> solution for this issue. Interestingly, we found in the article, by changing 
> the type of the ‘field’ from sting to  ‘text_general’ might solve the issue. 
> Please have a look in the below forum:
>
>
>
> https://stackoverflow.com/questions/29445323/adding-a-document-to-the-index-in-solr-document-contains-at-least-one-immense-t
>
>
>
> Schema.xml:
>
> Changed from:
>
> ‘<field name="text" type="string_rev" indexed="true" stored="false" 
> multiValued="true" />’
>
>
>
> Changed to:
>
> ‘<field name="text" type="text_general " indexed="true" stored="false" 
> multiValued="true" />’
>
>
>
> We have tried it but still it is not importing the files > 32 KB or 32766 
> bytes.
>
>
>
> Could you please let us know the solution to fix this issue? We’ll be 
> awaiting your reply.
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to