I committed the code and it will be available in the next WS-Commons transport build. The methods are located in org.apache.axis2.format.ElementHelper in the axis2-transport-base module.
Andreas On Thu, Mar 12, 2009 at 00:06, Kim Horn <[email protected]> wrote: > Hello Andreas, > This is great and really helps, have not had time to try it out but will soon. > > Contributing the java.io.Reader would be a great help but it will take me a > while to get up to speed to do the Synapse iterator. > > In the short term I am going to use a brute force approach that is now > feasible given the memory issue is resolved. Just thought of this one today. > Use VFS proxy to FTP file locally; so streaming helps here. A POJOCommand on > <out> to split file into another directory, stream in and out. Another > independent VFS proxy watches that directory and submits each file to Web > service. Hopefully memory will be fine. Overloading the destination may still > be an issue ? > > Kim > > > > -----Original Message----- > From: Andreas Veithen [mailto:[email protected]] > Sent: Monday, 9 March 2009 10:55 PM > To: [email protected] > Subject: Re: VFS - Synapse Memory Leak > > The changes I did in the VFS transport and the message builders for > text/plain and application/octet-stream certainly don't provide an > out-of-the-box solution for your use case, but they are the > prerequisite. > > Concerning your first proposed solution (let the VFS write the content > to a temporary file), I don't like this because it would create a > tight coupling between the VFS transport and the mediator. A design > goal should be that the solution will still work if the file comes > from another source, e.g. an attachment in an MTOM or SwA message. > > I thing that an all-Synapse solution (2 or 3) should be possible, but > this will require development of a custom mediator. This mediator > would read the content, split it up (and store the chunks in memory or > an disk) and executes a sub-sequence for each chunk. The execution of > the sub-sequence would happen synchronously to limit the memory/disk > space consumption (to the maximum chunk size) and to avoid flooding > the destination service. > > Note that it is probably not possible to implemented the mediator > using a script because of the problematic String handling. Also, > Spring, POJO and class mediators don't support sub-sequences (I > think). Therefore it should be implemented as a full-featured Java > mediator, probably taking the existing iterate mediator as a template. > I can contribute the required code to get the text content in the form > of a java.io.Reader. > > Regards, > > Andreas > > On Mon, Mar 9, 2009 at 03:05, kimhorn <[email protected]> wrote: >> >> Although this is a good feature it may not solve the actual problem ? >> The main first issue on my list was the memory leak. >> However, the real problem is once I get this massive files I have to send >> it to a web Service that can only take it in small chunks (about 14MB) . >> Streaming it straight out would just kill the destination Web service. It >> would get the memory error. The text document can be split apart easily, as >> it has independant records on each line seperated by <CR> <LF>. >> >> In an earlier post; that was not responded too, I mentioned: >> >> "Otherwise; for large EDI files a VFS iterator Mediator that streams through >> input file and outputs smaller >> chunks for processing, in Synapse, may be a solution ? " >> >> So I had mentioned a few solutions, in prior posts, solution now are: >> >> 1) VFS writes straight to temporary file, then a Java mediator can process >> the file by splitting it into many smaller files. These files then trigger >> another VFS proxy that submits these to the final web Service. >> The problem is is that is uses the file system (not so bad). >> 2) A Java Mediator takes the <text> package and splits it up by wrapping >> into many XML <data> elements that can then be acted on by a Synapse >> Iterator. So replace the text message with many smaller XML elements. >> Problem is that this loads whole message into memory. >> 3) Create another Iterator in Synapse that works on Regular expression (to >> split the text data) or actually uses a for loop approach to chop the file >> into chunks based on the loop index value. E.g. Index = 23 means a 14K chunk >> 23 chunks into the data. >> 4) Using the approach proposed now - just submit the file straight (stream >> it) to another web service that chops it up. It may return an XML document >> with many sub elelements that allows the standard Iterator to work. Similar >> to (2) but using another service rather than Java to split document. >> 5) Using the approach proposed now - just submit the file straight (stream >> it) to another web service that chops it up but calls a Synapse proxy with >> each small packet of data that then forwards it to the final WEb Service. So >> the Web Service iterates across the data; and not Synapse. >> >> Then other solutions replace Synapse with a stand alone Java program at the >> front end. >> >> Another issue here is throttling: Splitting the file is one issues but >> submitting 100's of calls in parralel to the destination service would >> result in time outs... So need to work in throttling. >> >> >> >> >> >> >> >> >> Ruwan Linton wrote: >>> >>> I agree and can understand the time factor and also +1 for reusing stuff >>> than trying to invent the wheel again :-) >>> >>> Thanks, >>> Ruwan >>> >>> On Sun, Mar 8, 2009 at 4:08 PM, Andreas Veithen >>> <[email protected]>wrote: >>> >>>> Ruwan, >>>> >>>> It's not a question of possibility, it is a question of available time >>>> :-) >>>> >>>> Also note that some of the features that we might want to implement >>>> have some similarities with what is done for attachments in Axiom >>>> (except that an attachment is only available once, while a file over >>>> VFS can be read several times). I think there is also some existing >>>> code in Axis2 that might be useful. We should not reimplement these >>>> things but try to make the existing code reusable. This however is >>>> only realistic for the next release after 1.3. >>>> >>>> Andreas >>>> >>>> On Sun, Mar 8, 2009 at 03:47, Ruwan Linton <[email protected]> >>>> wrote: >>>> > Andreas, >>>> > >>>> > Can we have the caching at the file system as a property to support the >>>> > multiple layers touching the full message and is it possible make it to >>>> > specify a threshold for streaming? For example if the message is >>>> touched >>>> > several time we might still need streaming but not for the 100KB or >>>> lesser >>>> > files. >>>> > >>>> > Thanks, >>>> > Ruwan >>>> > >>>> > On Sun, Mar 8, 2009 at 1:12 AM, Andreas Veithen < >>>> [email protected]> >>>> > wrote: >>>> >> >>>> >> I've done an initial implementation of this feature. It is available >>>> >> in trunk and should be included in the next nightly build. In order to >>>> >> enable this in your configuration, you need to add the following >>>> >> property to the proxy: >>>> >> >>>> >> <parameter name="transport.vfs.Streaming">true</parameter> >>>> >> >>>> >> You also need to add the following mediators just before the <send> >>>> >> mediator: >>>> >> >>>> >> <property action="remove" name="transportNonBlocking" scope="axis2"/> >>>> >> <property action="set" name="OUT_ONLY" value="true"/> >>>> >> >>>> >> With this configuration Synapse will stream the data directly from the >>>> >> incoming to the outgoing transport without storing it in memory or in >>>> >> a temporary file. Note that this has two other side effects: >>>> >> * The incoming file (or connection in case of a remote file) will only >>>> >> be opened on demand. In this case this happens during execution of the >>>> >> <send> mediator. >>>> >> * If during the mediation the content of the file is needed several >>>> >> time (which is not the case in your example), it will be read several >>>> >> times. The reason is of course that the content is not cached. >>>> >> >>>> >> I tested the solution with a 2GB file and it worked fine. The >>>> >> performance of the implementation is not yet optimal, but at least the >>>> >> memory consumption is constant. >>>> >> >>>> >> Some additional comments: >>>> >> * The transport.vfs.Streaming property has no impact on XML and SOAP >>>> >> processing: this type of content is processed exactly as before. >>>> >> * With the changes described here, we have now two different policies >>>> >> for plain text and binary content processing: in-memory caching + no >>>> >> streaming (transport.vfs.Streaming=false) and no caching + deferred >>>> >> connection + streaming (transport.vfs.Streaming=true). Probably we >>>> >> should define a wider range of policies in the future, including file >>>> >> system caching + streaming. >>>> >> * It is necessary to remove the transportNonBlocking property >>>> >> (MessageContext.TRANSPORT_NON_BLOCKING) to prevent the <send> mediator >>>> >> (more precisely the OperationClient) from executing the outgoing >>>> >> transport in a separate thread. This property is set by the incoming >>>> >> transport. I think this is a bug since I don't see any valid reason >>>> >> why the transport that handles the incoming request should determine >>>> >> the threading behavior of the transport that sends the outgoing >>>> >> request to the target service. Maybe Asankha can comment on this? >>>> >> >>>> >> Andreas >>>> >> >>>> >> On Thu, Mar 5, 2009 at 07:21, kimhorn <[email protected]> wrote: >>>> >> > >>>> >> > Thats good; as this stops us using Synapse. >>>> >> > >>>> >> > >>>> >> > >>>> >> > Asankha C. Perera wrote: >>>> >> >> >>>> >> >> >>>> >> >>> Exception in thread "vfs-Worker-4" java.lang.OutOfMemoryError: >>>> Java >>>> >> >>> heap >>>> >> >>> space >>>> >> >>> at >>>> >> >>> >>>> >> >>> >>>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99) >>>> >> >>> at >>>> >> >>> >>>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518) >>>> >> >>> at java.lang.StringBuffer.append(StringBuffer.java:307) >>>> >> >>> at java.io.StringWriter.write(StringWriter.java:72) >>>> >> >>> at >>>> org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1129) >>>> >> >>> at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104) >>>> >> >>> at org.apache.commons.io.IOUtils.copy(IOUtils.java:1078) >>>> >> >>> at >>>> org.apache.commons.io.IOUtils.toString(IOUtils.java:382) >>>> >> >>> at >>>> >> >>> >>>> >> >>> >>>> org.apache.synapse.format.PlainTextBuilder.processDocument(PlainTextBuilder.java:68) >>>> >> >>> >>>> >> >> Since the content type is text, the plain text formatter is trying >>>> to >>>> >> >> use a String to parse as I see.. which is a problem for large >>>> content.. >>>> >> >> >>>> >> >> A definite bug we need to fix .. >>>> >> >> >>>> >> >> cheers >>>> >> >> asankha >>>> >> >> >>>> >> >> -- >>>> >> >> Asankha C. Perera >>>> >> >> AdroitLogic, http://adroitlogic.org >>>> >> >> >>>> >> >> http://esbmagic.blogspot.com >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> --------------------------------------------------------------------- >>>> >> >> To unsubscribe, e-mail: [email protected] >>>> >> >> For additional commands, e-mail: [email protected] >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> > >>>> >> > -- >>>> >> > View this message in context: >>>> >> > >>>> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22345904.html >>>> >> > Sent from the Synapse - Dev mailing list archive at Nabble.com. >>>> >> > >>>> >> > >>>> >> > >>>> --------------------------------------------------------------------- >>>> >> > To unsubscribe, e-mail: [email protected] >>>> >> > For additional commands, e-mail: [email protected] >>>> >> > >>>> >> > >>>> >> >>>> >> --------------------------------------------------------------------- >>>> >> To unsubscribe, e-mail: [email protected] >>>> >> For additional commands, e-mail: [email protected] >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > Ruwan Linton >>>> > http://wso2.org - "Oxygenating the Web Services Platform" >>>> > http://ruwansblog.blogspot.com/ >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>> >>> >>> -- >>> Ruwan Linton >>> http://wso2.org - "Oxygenating the Web Services Platform" >>> http://ruwansblog.blogspot.com/ >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22405973.html >> Sent from the Synapse - Dev mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
