[ 
https://issues.apache.org/jira/browse/WSCOMMONS-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660198#action_12660198
 ] 

Andreas Veithen commented on WSCOMMONS-424:
-------------------------------------------

Additional issue:

If the DataHandler was constructed from an Object rather than a DataSource, a 
call to DataSource#getInputStream() will start a new thread and return a 
PipedInputStream. This is so for Geronimo's as well as Sun's JAF implementaion. 
The reason is that DataContentHandler only has a writeTo and no getInputStream 
method. Obviously starting a new thread just to check the size of the data is 
an overhead that should be avoided.

> BufferUtils#doesDataHandlerExceedLimit needs review
> ---------------------------------------------------
>
>                 Key: WSCOMMONS-424
>                 URL: https://issues.apache.org/jira/browse/WSCOMMONS-424
>             Project: WS-Commons
>          Issue Type: Bug
>          Components: AXIOM
>            Reporter: Andreas Veithen
>
> The code in BufferUtils#doesDataHandlerExceedLimit has several issues and 
> should be reviewed:
> 1) The code never closes the InputStream requested from the DataSource. This 
> might have unexpected consequences if the DataSource is a FileDataSource.
> 2) The code assumes that there are DataSources that can only be read once. 
> Indeed the code in BufferUtils#getInputStream throws an exception if the 
> input stream returned from the DataSource doesn't support mark ("Stream does 
> not support mark, Cannot read the stream as DataSource will be consumed."). 
> This is plain wrong, because by definition a DataSource can be read several 
> times (this is the very reason for the existence of this interface). If there 
> are DataSource implementations that can be "consumed", i.e. read only once, 
> they need to be fixed.
> 3) The code assumes that the end of stream is reached when 
> InputStream#available() returns 0. This is wrong.
> 4) doesDataHandlerExceedLimit tries to establish a lower bound on the 
> DataSource size by reading data from it. This is suboptimal, because in most 
> cases this can be achieved without actually reading a single byte from the 
> data source:
> * If the DataSource is a FileDataSource, it is possible to get the File 
> object and the size of the DataSource can be determined from the file size. 
> This is much less costly than to open the file and read data from it.
> * InputStream#available() can always be used to get a lower limit on the 
> stream size. For a ByteArrayDataSource this actually returns the size 
> directly.
> * InputStream#skip can be used to advance in the stream without reading from 
> it.
> Only if the InputStream implementation returned by the DataSource neither 
> implements available nor skip (this is possible), it is necessary to actually 
> read the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to