[jira] [Comment Edited] (SOLR-8266) Remove Java Serialization from the Streaming API

Jason Gerlowski (JIRA) Mon, 07 Dec 2015 12:23:02 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045641#comment-15045641
 ]


Jason Gerlowski edited comment on SOLR-8266 at 12/7/15 8:21 PM:
----------------------------------------------------------------

Thanks for the early look/help guys.  I've updated the patch to align with your 
suggestions.  The tests all pass now, with the caveat that I had to comment out 
the testParallelEOF call in testStreams().  Everything else seems to work fine.

While I'm confident in the code I wrote, I do have some questions about how 
this code is working:

1.) I just wanted to double-check my understanding of the {{StreamFactory}} 
configuration discrepancies causing the stack trace I mentioned above.

{{StreamFactory}} is used to convert between {{TupleStreams}} and 
{{StreamExpressionParameters}} and are used in both SolrJ ( {{ParallelStream}} 
for example) and SOLR ({{StreamHandler}}).  These factories need to be in sync. 
 If not, SolrJ can end up sending serialized values to Solr that it doesn't 
know what to do with (as in the conflicting "count" types from a few comments 
ago).

Just trying to make sure I understand the big picture, so I can make edits with 
more confidence next time.

2.) I noticed that {{StreamFactory.withFunctionName()}} takes any {{Class}} as 
it's second argument.  Is there any value in narrowing the type of that 
parameter, so that it's harder to misconfigure {{StreamFactory}}?  (i.e. 
{{withFunctionName(String name, Class<? extends Expressible> clazz)}}).

3.) StreamingTest has a single test that triggers 20ish independent sub-tests 
(e.g. testParallelEOF, testStatsStream).  Updating these tests to use the 
{{StreamFactory}} was a bit more painful that it probably could've been.  With 
the sub-tests all bundled together, the process for fixing them looked like: 
(a) run them all, (b) wait for a failure, (c) scroll back through the output to 
see which clause failed, (d) fix and repeat.  Had the sub-tests been separate 
JUnit tests, identifying the failing pieces and fixing them would've been 
easier.  Is there a historical reason theses tests have all been grouped 
together (performance/overhead for example)?  If not, do you think there would 
be any pushback on splitting these test clauses apart (as a part of a separate 
JIRA).



was (Author: gerlowskija):
Thanks for the early look/help guys.  I've updated the patch to align with your 
suggestions.  The tests all pass now, with the caveat that I had to comment out 
the testParallelEOF call in testStreams().  Everything else seems to work fine.

While I'm confident in the code I wrote, I do have some questions about how 
this code is working:

1.) I just wanted to double-check my understanding of the {{StreamFactory}} 
configuration discrepancies causing the stack trace I mentioned above.

{{StreamFactory}} is used to convert between {{TupleStream}}s and 
{{StreamExpressionParameters}} and are used in both SolrJ ({{ParallelStream}} 
for example) and SOLR ({{StreamHandler}}).  These factories need to be in sync. 
 If not, SolrJ can end up sending serialized values to Solr that it doesn't 
know what to do with (as in the conflicting "count" types from a few comments 
ago).

Just trying to make sure I understand the big picture, so I can make edits with 
more confidence next time.

2.) I noticed that {{StreamFactory.withFunctionName()}} takes any {{Class}} as 
it's second argument.  Is there any value in narrowing the type of that 
parameter, so that it's harder to misconfigure {{StreamFactory}}?  (i.e. 
{{withFunctionName(String name, Class<? extends Expressible> clazz)}}).

3.) StreamingTest has a single test that triggers 20ish independent sub-tests 
(e.g. testParallelEOF, testStatsStream).  Updating these tests to use the 
{{StreamFactory}} was a bit more painful that it probably could've been.  With 
the sub-tests all bundled together, the process for fixing them looked like: 
(a) run them all, (b) wait for a failure, (c) scroll back through the output to 
see which clause failed, (d) fix and repeat.  Had the sub-tests been separate 
JUnit tests, identifying the failing pieces and fixing them would've been 
easier.  Is there a historical reason theses tests have all been grouped 
together (performance/overhead for example)?  If not, do you think there would 
be any pushback on splitting these test clauses apart (as a part of a separate 
JIRA).


> Remove Java Serialization from the Streaming API
> ------------------------------------------------
>
>                 Key: SOLR-8266
>                 URL: https://issues.apache.org/jira/browse/SOLR-8266
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Joel Bernstein
>         Attachments: SOLR-8266-comment-out-testParallelEOF.patch, 
> SOLR-8266.patch
>
>
> This is being done mainly for security reasons but it's also architecturally 
> the right thing to do.
> Going forward only Streaming Expressions will be used to serialize Streaming 
> API Objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-8266) Remove Java Serialization from the Streaming API

Reply via email to