[
https://issues.apache.org/jira/browse/SOLR-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748184#action_12748184
]
frank farmer commented on SOLR-1091:
------------------------------------
My concern is not that solr do anything specific with this garbled data, only
that wt=phps always returns a string that can be run through unserialize()
without error.
Here's the exact case in which I encountered this bug, which may help explain
why I reported this issue in the first place:
1) Somehow, a user inserted the aforementioned sequence of bytes in some
user-editable content in my application.
2) My code blindly passed that data directly into solr (in retrospect, I should
probably be filtering anything that's not valid UTF-8)
3) Users ran queries which included the affected document
4) My code tried to unserialize() the output, and failed with a PHP error
(simply replacing the offending "s:4:" with "s:6:" caused the output to
unserialize without issue, however). This caused my users to be unable to
retrieve results for many queries.
Long story short, if you let users insert arbitrary byte sequences into your
index (which I'll admit is naive, but I'm sure I'm not the only one who's done
this), and you use wt=phps, a malicious user can effectively cause a DoS.
Again, I don't care about actually getting these bytes back out of solr
unmangled. I only care that the output of wt=phps make it through
unserialize() without causing a PHP error.
> "phps" (serialized PHP) writer produces invalid output
> ------------------------------------------------------
>
> Key: SOLR-1091
> URL: https://issues.apache.org/jira/browse/SOLR-1091
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 1.3
> Environment: Sun JRE 1.6.0 on Centos 5
> Reporter: frank farmer
> Priority: Minor
> Fix For: 1.4
>
>
> The serialized PHP output writer can outputs invalid string lengths for
> certain (unusual) input values. Specifically, I had a document containing
> the following 6 byte character sequence: \xED\xAF\x80\xED\xB1\xB8
> I was able to create a document in the index containing this value without
> issue; however, when fetching the document back out using the serialized PHP
> writer, it returns a string like the following:
> s:4:"􀁸";
> Note that the string length specified is 4, while the string is actually 6
> bytes long.
> When using PHP's native serialize() function, it correctly sets the length to
> 6:
> # php -r 'var_dump(serialize("\xED\xAF\x80\xED\xB1\xB8"));'
> string(13) "s:6:"􀁸";"
> The "wt=php" writer, which produces output to be parsed with eval(), doesn't
> have any trouble with this string.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.