After evaluation I suspected the same thing. The big issue of the whole
matter is that DSpace 7.6, while it was exporting the statistics, was
adding escape characters (\) to the path character (again \) probably which
increased the "query" size.

I am just going to delete that record all together in our production system
so any further solr exports do not produce the same error when syncing our
development machine.

Thanks for your help Tim!

James

On Wed, Sep 13, 2023 at 10:25 AM DSpace Technical Support <
[email protected]> wrote:

> Hi James,
>
> That long query looks suspiciously like a "directory traversal" attack
> that someone tried to (unsuccessfully) run against your system in the past
> using the search page.  For example:
> https://www.acunetix.com/websitesecurity/directory-traversal/  (Notice
> how the query you shared had "win.ini" which is a common directory
> traversal attack attempt)
>
> This sort of attack won't work on DSpace (so there's nothing to worry
> about). But it might have been logged in your statistics because it was
> attempted from the DSpace search page.
>
> Overall, my opinion is you may just want to *delete* this entry from your
> exported CSV.  It doesn't look like a valid statistical entry that you'd
> want to "count".  It looks like someone was attempting to attack your site
> (and failing to do so).
>
> Tim
>
> On Tuesday, September 12, 2023 at 4:56:51 PM UTC-5 James Holobetz wrote:
>
>> During our move from DSpace 6.x to DSpace 7.x we had to combine solr
>> shards and then use the UUIDfix tool to convert the old DSpace Object ID to
>> UUID. Anyways, I saved all the csv files for solr ingest and went looking
>> through them for clues about the "query" in question. The
>> solr-export-statistics dump from 6.3 looks different from the
>> solr-export-statistics dump from 7.6 for the query in question.
>>
>>
>>
>> On Tue, Sep 12, 2023 at 2:14 PM James Holobetz <[email protected]> wrote:
>>
>>> I have found the "query" string in question in the particular csv file
>>> that was dumped (solr-export-statistics) from our DSpace 7.6 production
>>> machine. I have attached the relevant files to help as to any clue what may
>>> be happening.
>>>
>>> Thank you,
>>>
>>> James
>>>
>>> On Tue, Sep 12, 2023 at 11:12 AM DSpace Technical Support <
>>> [email protected]> wrote:
>>>
>>>> Hi James,
>>>>
>>>> I have to admit, I've never seen that error before.  My guess is
>>>> there's something odd/different (or incorrect) with the data that you are
>>>> trying to import.  But, I don't know what it could be.  That error mentions
>>>> the "query" field is the problematic one.  Have you looked at the data you
>>>> are trying to import to see why that "query" field is so long?  Maybe
>>>> something is incorrect in that import data, or maybe it's encoded
>>>> improperly and the script is stumbling over it?
>>>>
>>>> Tim
>>>>
>>>> On Monday, September 11, 2023 at 3:49:56 PM UTC-5 James Holobetz wrote:
>>>>
>>>>>
>>>>> (I sent it early be mistake)
>>>>>
>>>>>
>>>>> https://mail.google.com/mail/u/0/?tab=rm&ogbl#search/Exception+writing+document+id/FMfcgzGtvsbMhlrcPsHZXVSJjRcZSsvW
>>>>>
>>>>>
>>>>> https://stackoverflow.com/questions/37070593/how-to-deal-with-document-contains-at-least-one-immense-term-in-solr
>>>>>
>>>>>
>>>>> 1) What would cause this (on the production machine)?
>>>>>
>>>>> 2) How do I resolve this issue?
>>>>>
>>>>> Thank you
>>>>>
>>>>> On Mon, Sep 11, 2023 at 2:46 PM James Holobetz <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I am moving data from our production dspace 7.6 server to our
>>>>>> development dspace 7.6 server and I am repeatedly receiving this error:
>>>>>>
>>>>>> holobetj dspace $ dsp /opt/dspace/bin/dspace solr-import-statistics -c
>>>>>> No index name provided, defaulting to "statistics".
>>>>>> Exception: Error from server at http://localhost:8983/solr/statistics:
>>>>>> Exception writing document id 01072706-6b8a-420d-9bc0-cc637bce3df4 to the
>>>>>> index; possible analysis error: Document contains at least one immense 
>>>>>> term
>>>>>> in field="query" (whose UTF8 encoding is longer than the max length 
>>>>>> 32766),
>>>>>> all of which were skipped.  Please correct the analyzer to not produce 
>>>>>> such
>>>>>> terms.  The prefix of the first immense term is: '[117, 110, 101, 120, 
>>>>>> 105,
>>>>>> 115, 116, 105, 110, 103, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46, 46,
>>>>>> 47, 46, 46, 47, 46, 46, 47, 46]...', original message: bytes can be at 
>>>>>> most
>>>>>> 32766 in length; got 34396. Perhaps the document has an indexed string
>>>>>> field (solr.StrField) which is too large
>>>>>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>>>>>> Error from server at http://localhost:8983/solr/statistics:
>>>>>> Exception writing document id 01072706-6b8a-420d-9bc0-cc637bce3df4 to the
>>>>>> index; possible analysis error: Document contains at least one immense 
>>>>>> term
>>>>>> in field="query" (whose UTF8 encoding is longer than the max length 
>>>>>> 32766),
>>>>>> all of which were skipped.  Please correct the analyzer to not produce 
>>>>>> such
>>>>>> terms.  The prefix of the first immense term is: '[117, 110, 101, 120, 
>>>>>> 105,
>>>>>> 115, 116, 105, 110, 103, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46, 46,
>>>>>> 47, 46, 46, 47, 46, 46, 47, 46]...', original message: bytes can be at 
>>>>>> most
>>>>>> 32766 in length; got 34396. Perhaps the document has an indexed string
>>>>>> field (solr.StrField) which is too large
>>>>>>
>>>>>>
>>>>>> Looking in  the forums here I have seen the error very rarely:
>>>>>>
>>>>>> --
>>>> All messages to this mailing list should adhere to the Code of Conduct:
>>>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "DSpace Technical Support" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/dspace-tech/6810f574-69e6-4676-95ec-717b4ca22a72n%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/dspace-tech/6810f574-69e6-4676-95ec-717b4ca22a72n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
> All messages to this mailing list should adhere to the Code of Conduct:
> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dspace-tech/a9e6aeb3-f281-4956-9b30-7db235261f99n%40googlegroups.com
> <https://groups.google.com/d/msgid/dspace-tech/a9e6aeb3-f281-4956-9b30-7db235261f99n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/CAAosP7XDs86VA7YhjjyGEnCLQ7-BR13TWBhA3D0Dqr%3DbWUe39w%40mail.gmail.com.

Reply via email to