[ https://issues.apache.org/jira/browse/WSCOMMONS-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660198#action_12660198 ]
Andreas Veithen commented on WSCOMMONS-424: ------------------------------------------- Additional issue: If the DataHandler was constructed from an Object rather than a DataSource, a call to DataSource#getInputStream() will start a new thread and return a PipedInputStream. This is so for Geronimo's as well as Sun's JAF implementaion. The reason is that DataContentHandler only has a writeTo and no getInputStream method. Obviously starting a new thread just to check the size of the data is an overhead that should be avoided. > BufferUtils#doesDataHandlerExceedLimit needs review > --------------------------------------------------- > > Key: WSCOMMONS-424 > URL: https://issues.apache.org/jira/browse/WSCOMMONS-424 > Project: WS-Commons > Issue Type: Bug > Components: AXIOM > Reporter: Andreas Veithen > > The code in BufferUtils#doesDataHandlerExceedLimit has several issues and > should be reviewed: > 1) The code never closes the InputStream requested from the DataSource. This > might have unexpected consequences if the DataSource is a FileDataSource. > 2) The code assumes that there are DataSources that can only be read once. > Indeed the code in BufferUtils#getInputStream throws an exception if the > input stream returned from the DataSource doesn't support mark ("Stream does > not support mark, Cannot read the stream as DataSource will be consumed."). > This is plain wrong, because by definition a DataSource can be read several > times (this is the very reason for the existence of this interface). If there > are DataSource implementations that can be "consumed", i.e. read only once, > they need to be fixed. > 3) The code assumes that the end of stream is reached when > InputStream#available() returns 0. This is wrong. > 4) doesDataHandlerExceedLimit tries to establish a lower bound on the > DataSource size by reading data from it. This is suboptimal, because in most > cases this can be achieved without actually reading a single byte from the > data source: > * If the DataSource is a FileDataSource, it is possible to get the File > object and the size of the DataSource can be determined from the file size. > This is much less costly than to open the file and read data from it. > * InputStream#available() can always be used to get a lower limit on the > stream size. For a ByteArrayDataSource this actually returns the size > directly. > * InputStream#skip can be used to advance in the stream without reading from > it. > Only if the InputStream implementation returned by the DataSource neither > implements available nor skip (this is possible), it is necessary to actually > read the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.