The changes I did in the VFS transport and the message builders for text/plain and application/octet-stream certainly don't provide an out-of-the-box solution for your use case, but they are the prerequisite.
Concerning your first proposed solution (let the VFS write the content to a temporary file), I don't like this because it would create a tight coupling between the VFS transport and the mediator. A design goal should be that the solution will still work if the file comes from another source, e.g. an attachment in an MTOM or SwA message. I thing that an all-Synapse solution (2 or 3) should be possible, but this will require development of a custom mediator. This mediator would read the content, split it up (and store the chunks in memory or an disk) and executes a sub-sequence for each chunk. The execution of the sub-sequence would happen synchronously to limit the memory/disk space consumption (to the maximum chunk size) and to avoid flooding the destination service. Note that it is probably not possible to implemented the mediator using a script because of the problematic String handling. Also, Spring, POJO and class mediators don't support sub-sequences (I think). Therefore it should be implemented as a full-featured Java mediator, probably taking the existing iterate mediator as a template. I can contribute the required code to get the text content in the form of a java.io.Reader. Regards, Andreas On Mon, Mar 9, 2009 at 03:05, kimhorn <[email protected]> wrote: > > Although this is a good feature it may not solve the actual problem ? > The main first issue on my list was the memory leak. > However, the real problem is once I get this massive files I have to send > it to a web Service that can only take it in small chunks (about 14MB) . > Streaming it straight out would just kill the destination Web service. It > would get the memory error. The text document can be split apart easily, as > it has independant records on each line seperated by <CR> <LF>. > > In an earlier post; that was not responded too, I mentioned: > > "Otherwise; for large EDI files a VFS iterator Mediator that streams through > input file and outputs smaller > chunks for processing, in Synapse, may be a solution ? " > > So I had mentioned a few solutions, in prior posts, solution now are: > > 1) VFS writes straight to temporary file, then a Java mediator can process > the file by splitting it into many smaller files. These files then trigger > another VFS proxy that submits these to the final web Service. > The problem is is that is uses the file system (not so bad). > 2) A Java Mediator takes the <text> package and splits it up by wrapping > into many XML <data> elements that can then be acted on by a Synapse > Iterator. So replace the text message with many smaller XML elements. > Problem is that this loads whole message into memory. > 3) Create another Iterator in Synapse that works on Regular expression (to > split the text data) or actually uses a for loop approach to chop the file > into chunks based on the loop index value. E.g. Index = 23 means a 14K chunk > 23 chunks into the data. > 4) Using the approach proposed now - just submit the file straight (stream > it) to another web service that chops it up. It may return an XML document > with many sub elelements that allows the standard Iterator to work. Similar > to (2) but using another service rather than Java to split document. > 5) Using the approach proposed now - just submit the file straight (stream > it) to another web service that chops it up but calls a Synapse proxy with > each small packet of data that then forwards it to the final WEb Service. So > the Web Service iterates across the data; and not Synapse. > > Then other solutions replace Synapse with a stand alone Java program at the > front end. > > Another issue here is throttling: Splitting the file is one issues but > submitting 100's of calls in parralel to the destination service would > result in time outs... So need to work in throttling. > > > > > > > > > Ruwan Linton wrote: >> >> I agree and can understand the time factor and also +1 for reusing stuff >> than trying to invent the wheel again :-) >> >> Thanks, >> Ruwan >> >> On Sun, Mar 8, 2009 at 4:08 PM, Andreas Veithen >> <[email protected]>wrote: >> >>> Ruwan, >>> >>> It's not a question of possibility, it is a question of available time >>> :-) >>> >>> Also note that some of the features that we might want to implement >>> have some similarities with what is done for attachments in Axiom >>> (except that an attachment is only available once, while a file over >>> VFS can be read several times). I think there is also some existing >>> code in Axis2 that might be useful. We should not reimplement these >>> things but try to make the existing code reusable. This however is >>> only realistic for the next release after 1.3. >>> >>> Andreas >>> >>> On Sun, Mar 8, 2009 at 03:47, Ruwan Linton <[email protected]> >>> wrote: >>> > Andreas, >>> > >>> > Can we have the caching at the file system as a property to support the >>> > multiple layers touching the full message and is it possible make it to >>> > specify a threshold for streaming? For example if the message is >>> touched >>> > several time we might still need streaming but not for the 100KB or >>> lesser >>> > files. >>> > >>> > Thanks, >>> > Ruwan >>> > >>> > On Sun, Mar 8, 2009 at 1:12 AM, Andreas Veithen < >>> [email protected]> >>> > wrote: >>> >> >>> >> I've done an initial implementation of this feature. It is available >>> >> in trunk and should be included in the next nightly build. In order to >>> >> enable this in your configuration, you need to add the following >>> >> property to the proxy: >>> >> >>> >> <parameter name="transport.vfs.Streaming">true</parameter> >>> >> >>> >> You also need to add the following mediators just before the <send> >>> >> mediator: >>> >> >>> >> <property action="remove" name="transportNonBlocking" scope="axis2"/> >>> >> <property action="set" name="OUT_ONLY" value="true"/> >>> >> >>> >> With this configuration Synapse will stream the data directly from the >>> >> incoming to the outgoing transport without storing it in memory or in >>> >> a temporary file. Note that this has two other side effects: >>> >> * The incoming file (or connection in case of a remote file) will only >>> >> be opened on demand. In this case this happens during execution of the >>> >> <send> mediator. >>> >> * If during the mediation the content of the file is needed several >>> >> time (which is not the case in your example), it will be read several >>> >> times. The reason is of course that the content is not cached. >>> >> >>> >> I tested the solution with a 2GB file and it worked fine. The >>> >> performance of the implementation is not yet optimal, but at least the >>> >> memory consumption is constant. >>> >> >>> >> Some additional comments: >>> >> * The transport.vfs.Streaming property has no impact on XML and SOAP >>> >> processing: this type of content is processed exactly as before. >>> >> * With the changes described here, we have now two different policies >>> >> for plain text and binary content processing: in-memory caching + no >>> >> streaming (transport.vfs.Streaming=false) and no caching + deferred >>> >> connection + streaming (transport.vfs.Streaming=true). Probably we >>> >> should define a wider range of policies in the future, including file >>> >> system caching + streaming. >>> >> * It is necessary to remove the transportNonBlocking property >>> >> (MessageContext.TRANSPORT_NON_BLOCKING) to prevent the <send> mediator >>> >> (more precisely the OperationClient) from executing the outgoing >>> >> transport in a separate thread. This property is set by the incoming >>> >> transport. I think this is a bug since I don't see any valid reason >>> >> why the transport that handles the incoming request should determine >>> >> the threading behavior of the transport that sends the outgoing >>> >> request to the target service. Maybe Asankha can comment on this? >>> >> >>> >> Andreas >>> >> >>> >> On Thu, Mar 5, 2009 at 07:21, kimhorn <[email protected]> wrote: >>> >> > >>> >> > Thats good; as this stops us using Synapse. >>> >> > >>> >> > >>> >> > >>> >> > Asankha C. Perera wrote: >>> >> >> >>> >> >> >>> >> >>> Exception in thread "vfs-Worker-4" java.lang.OutOfMemoryError: >>> Java >>> >> >>> heap >>> >> >>> space >>> >> >>> at >>> >> >>> >>> >> >>> >>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99) >>> >> >>> at >>> >> >>> >>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518) >>> >> >>> at java.lang.StringBuffer.append(StringBuffer.java:307) >>> >> >>> at java.io.StringWriter.write(StringWriter.java:72) >>> >> >>> at >>> org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1129) >>> >> >>> at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104) >>> >> >>> at org.apache.commons.io.IOUtils.copy(IOUtils.java:1078) >>> >> >>> at >>> org.apache.commons.io.IOUtils.toString(IOUtils.java:382) >>> >> >>> at >>> >> >>> >>> >> >>> >>> org.apache.synapse.format.PlainTextBuilder.processDocument(PlainTextBuilder.java:68) >>> >> >>> >>> >> >> Since the content type is text, the plain text formatter is trying >>> to >>> >> >> use a String to parse as I see.. which is a problem for large >>> content.. >>> >> >> >>> >> >> A definite bug we need to fix .. >>> >> >> >>> >> >> cheers >>> >> >> asankha >>> >> >> >>> >> >> -- >>> >> >> Asankha C. Perera >>> >> >> AdroitLogic, http://adroitlogic.org >>> >> >> >>> >> >> http://esbmagic.blogspot.com >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> --------------------------------------------------------------------- >>> >> >> To unsubscribe, e-mail: [email protected] >>> >> >> For additional commands, e-mail: [email protected] >>> >> >> >>> >> >> >>> >> >> >>> >> > >>> >> > -- >>> >> > View this message in context: >>> >> > >>> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22345904.html >>> >> > Sent from the Synapse - Dev mailing list archive at Nabble.com. >>> >> > >>> >> > >>> >> > >>> --------------------------------------------------------------------- >>> >> > To unsubscribe, e-mail: [email protected] >>> >> > For additional commands, e-mail: [email protected] >>> >> > >>> >> > >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: [email protected] >>> >> For additional commands, e-mail: [email protected] >>> >> >>> > >>> > >>> > >>> > -- >>> > Ruwan Linton >>> > http://wso2.org - "Oxygenating the Web Services Platform" >>> > http://ruwansblog.blogspot.com/ >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >> >> -- >> Ruwan Linton >> http://wso2.org - "Oxygenating the Web Services Platform" >> http://ruwansblog.blogspot.com/ >> >> > > -- > View this message in context: > http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22405973.html > Sent from the Synapse - Dev mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
