Hi James, That long query looks suspiciously like a "directory traversal" attack that someone tried to (unsuccessfully) run against your system in the past using the search page. For example: https://www.acunetix.com/websitesecurity/directory-traversal/ (Notice how the query you shared had "win.ini" which is a common directory traversal attack attempt)
This sort of attack won't work on DSpace (so there's nothing to worry about). But it might have been logged in your statistics because it was attempted from the DSpace search page. Overall, my opinion is you may just want to *delete* this entry from your exported CSV. It doesn't look like a valid statistical entry that you'd want to "count". It looks like someone was attempting to attack your site (and failing to do so). Tim On Tuesday, September 12, 2023 at 4:56:51 PM UTC-5 James Holobetz wrote: > During our move from DSpace 6.x to DSpace 7.x we had to combine solr > shards and then use the UUIDfix tool to convert the old DSpace Object ID to > UUID. Anyways, I saved all the csv files for solr ingest and went looking > through them for clues about the "query" in question. The > solr-export-statistics dump from 6.3 looks different from the > solr-export-statistics dump from 7.6 for the query in question. > > > > On Tue, Sep 12, 2023 at 2:14 PM James Holobetz <[email protected]> wrote: > >> I have found the "query" string in question in the particular csv file >> that was dumped (solr-export-statistics) from our DSpace 7.6 production >> machine. I have attached the relevant files to help as to any clue what may >> be happening. >> >> Thank you, >> >> James >> >> On Tue, Sep 12, 2023 at 11:12 AM DSpace Technical Support < >> [email protected]> wrote: >> >>> Hi James, >>> >>> I have to admit, I've never seen that error before. My guess is there's >>> something odd/different (or incorrect) with the data that you are trying to >>> import. But, I don't know what it could be. That error mentions the >>> "query" field is the problematic one. Have you looked at the data you are >>> trying to import to see why that "query" field is so long? Maybe something >>> is incorrect in that import data, or maybe it's encoded improperly and the >>> script is stumbling over it? >>> >>> Tim >>> >>> On Monday, September 11, 2023 at 3:49:56 PM UTC-5 James Holobetz wrote: >>> >>>> >>>> (I sent it early be mistake) >>>> >>>> >>>> https://mail.google.com/mail/u/0/?tab=rm&ogbl#search/Exception+writing+document+id/FMfcgzGtvsbMhlrcPsHZXVSJjRcZSsvW >>>> >>>> >>>> https://stackoverflow.com/questions/37070593/how-to-deal-with-document-contains-at-least-one-immense-term-in-solr >>>> >>>> >>>> 1) What would cause this (on the production machine)? >>>> >>>> 2) How do I resolve this issue? >>>> >>>> Thank you >>>> >>>> On Mon, Sep 11, 2023 at 2:46 PM James Holobetz <[email protected]> >>>> wrote: >>>> >>>>> I am moving data from our production dspace 7.6 server to our >>>>> development dspace 7.6 server and I am repeatedly receiving this error: >>>>> >>>>> holobetj dspace $ dsp /opt/dspace/bin/dspace solr-import-statistics -c >>>>> No index name provided, defaulting to "statistics". >>>>> Exception: Error from server at http://localhost:8983/solr/statistics: >>>>> Exception writing document id 01072706-6b8a-420d-9bc0-cc637bce3df4 to the >>>>> index; possible analysis error: Document contains at least one immense >>>>> term >>>>> in field="query" (whose UTF8 encoding is longer than the max length >>>>> 32766), >>>>> all of which were skipped. Please correct the analyzer to not produce >>>>> such >>>>> terms. The prefix of the first immense term is: '[117, 110, 101, 120, >>>>> 105, >>>>> 115, 116, 105, 110, 103, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46, 46, >>>>> 47, 46, 46, 47, 46, 46, 47, 46]...', original message: bytes can be at >>>>> most >>>>> 32766 in length; got 34396. Perhaps the document has an indexed string >>>>> field (solr.StrField) which is too large >>>>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: >>>>> Error from server at http://localhost:8983/solr/statistics: Exception >>>>> writing document id 01072706-6b8a-420d-9bc0-cc637bce3df4 to the index; >>>>> possible analysis error: Document contains at least one immense term in >>>>> field="query" (whose UTF8 encoding is longer than the max length 32766), >>>>> all of which were skipped. Please correct the analyzer to not produce >>>>> such >>>>> terms. The prefix of the first immense term is: '[117, 110, 101, 120, >>>>> 105, >>>>> 115, 116, 105, 110, 103, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46, 46, >>>>> 47, 46, 46, 47, 46, 46, 47, 46]...', original message: bytes can be at >>>>> most >>>>> 32766 in length; got 34396. Perhaps the document has an indexed string >>>>> field (solr.StrField) which is too large >>>>> >>>>> >>>>> Looking in the forums here I have seen the error very rarely: >>>>> >>>>> -- >>> All messages to this mailing list should adhere to the Code of Conduct: >>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "DSpace Technical Support" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/dspace-tech/6810f574-69e6-4676-95ec-717b4ca22a72n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/dspace-tech/6810f574-69e6-4676-95ec-717b4ca22a72n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/a9e6aeb3-f281-4956-9b30-7db235261f99n%40googlegroups.com.
