What Jan said. If you are getting this error on a _text_ based field, then your data is bad. What it's telling you is that _after_ tokenization, you have a single _term_ that's > 32K which is almost, but not quite totally, useless.
ImagineASingleWordThatRunsOnForMoreThanThirtyTwoThousandCharactersHowWouldThatBeUsefulToEitherSearchOrReturnToTheUserWouldThisSingleWordImTypingBeUsefulAndItIsntEvenCloseToThirtyTwoThousandCharacters..... So I'd try to find out what it is you're processing that shows you such a large term. It's pretty easy to run Tika on a file in SolrJ, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/ There are also web sites that'll process the PDF file through Tika and show you how it parses.... Best, Erick On Thu, Jan 24, 2019 at 12:57 AM Jan Høydahl <[email protected]> wrote: > > I cannot see why you'd want a single term of 32kb in your index anyway. Can > you give us examples of what these terms are and how you will search them? > What kind of files are you indexing, could it be like bad PDFs consisting of > a bunch of binary garbage? > Try adding a lengthFilterFactory to your fieldType(s). See > https://lucene.apache.org/solr/guide/7_6/filter-descriptions.html#length-filter > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > 24. jan. 2019 kl. 06:51 skrev Kranthi Kumar K > <[email protected]>: > > Thank you Bernd Fehling for your suggested solution, I've tried the same by > changing the type and added multivalued to true in Schema.xml file i.e, > change from: > > <field name="FileContent" type="text_general" indexed="true" stored="true" /> > > Changed to: > > <field name="FileContent" type="text_general" indexed="true" stored="true" > multiValued="true" /> > > After changing it also still we are unable to import the files size > 32 kb. > please find the solution suggested by Bernd in the below url: > > http://lucene.472066.n3.nabble.com/Re-Solr-Size-Limitation-upto-32-kb-limitation-td4421569.html > > Bernd Fehling, could you please suggest another alternative solution to > resolve our issue, which would help us alot? > > Please let me know for any questions. > > > <image001.png> > > Thanks & Regards, > Kranthi Kumar.K, > Software Engineer, > Ccube Fintech Global Services Pvt Ltd., > Email/Skype: [email protected], > Mobile: +91-8978078449. > > > From: Kranthi Kumar K > Sent: Friday, January 18, 2019 4:22 PM > To: [email protected]; [email protected] > Cc: Ananda Babu medida <[email protected]>; Srinivasa Reddy > Karri <[email protected]>; Michelle Ngo > <[email protected]>; Ravi Vangala <[email protected]> > Subject: RE: Solr Size Limitation upto 32 kb limitation > > Hi team, > > Thank you Erick Erickson ,Bernd Fehling , Jan Hoydahl for your suggested > solutions. I’ve tried the suggested one’s and still we are unable to import > files having size >32 kb, it is displaying same error. > > Below link has the suggested solutions. Please have a look once. > > http://lucene.472066.n3.nabble.com/Solr-Size-Limitation-upto-32-KB-files-td4419779.html > > > As per Erick Erickson, I’ve changed the string type to Text type based and > still the issue occurs . > > I’ve changed from : > > <field name="FileContent" type="string_rev" indexed="true" stored="true" /> > > Changed to: > > <field name="FileContent" type="text" indexed="true" stored="true" /> > > If we do so, it is showing error in the log, please find the error in the > attachment. > > If I change to: > > <field name="FileContent" type="text_general" indexed="true" stored="true" /> > > It is not showing any error , but the issue still exists. > > > As per Jan Hoydahl, I have gone through the link that you have provided and > checked ‘requestParsers’ tag in solrconfig.xml, > > > RequestParsers tag in our application is as follows: > > ‘<requestParsers enableRemoteStreaming="true" > multipartUploadLimitInKB="2048000" > formdataUploadLimitInKB="2048" > addHttpRequestToContext="false"/>’ > Request parsers, which we are using and in the link you have provided are > similar. And still we are unable to import the files size >32 kb. > > > As per Bernd Fehling, we are using Solr 4.10.2. you have mentioned as, > > ‘If you are trying to add larger content then you have to "chop" that > by yourself and add it as multivalued. Can be done within a self written > loader. ’ > > I’m a newbie to Solr and I didn’t get what exactly ‘self written loader’ is? > > Could you please provide us sample code, that helps us to go further? > > > > <image001.png> > > Thanks & Regards, > Kranthi Kumar.K, > Software Engineer, > Ccube Fintech Global Services Pvt Ltd., > Email/Skype: [email protected], > Mobile: +91-8978078449. > > > From: Kranthi Kumar K <[email protected]> > Sent: Thursday, January 17, 2019 12:43 PM > To: [email protected]; [email protected] > Cc: Ananda Babu medida <[email protected]>; Srinivasa Reddy > Karri <[email protected]>; Michelle Ngo > <[email protected]> > Subject: Re: Solr Size Limitation upto 32 kb limitation > > > Hi Team, > > > > Can we have any updates on the below issue? We are awaiting your reply. > > > > Thanks, > > Kranthi kumar.K > > ________________________________ > From: Kranthi Kumar K > Sent: Friday, January 4, 2019 5:01:38 PM > To: [email protected] > Cc: Ananda Babu medida; Srinivasa Reddy Karri > Subject: Solr Size Limitation upto 32 kb limitation > > > Hi team, > > > > We are currently using Solr 4.2.1 version in our project and everything is > going well. But recently, we are facing an issue with Solr Data Import. It is > not importing the files with size greater than 32766 bytes (i.e, 32 kb) and > showing 2 exceptions: > > > > java.lang.illegalargumentexception > org.apache.lucene.util.bytesref hash$maxbyteslengthexceededexception > > > > Please find the attached screenshot for reference. > > > > We have searched for solutions in many forums and didn’t find the exact > solution for this issue. Interestingly, we found in the article, by changing > the type of the ‘field’ from sting to ‘text_general’ might solve the issue. > Please have a look in the below forum: > > > > https://stackoverflow.com/questions/29445323/adding-a-document-to-the-index-in-solr-document-contains-at-least-one-immense-t > > > > Schema.xml: > > Changed from: > > ‘<field name="text" type="string_rev" indexed="true" stored="false" > multiValued="true" />’ > > > > Changed to: > > ‘<field name="text" type="text_general " indexed="true" stored="false" > multiValued="true" />’ > > > > We have tried it but still it is not importing the files > 32 KB or 32766 > bytes. > > > > Could you please let us know the solution to fix this issue? We’ll be > awaiting your reply. > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
