I committed the code and it will be available in the next WS-Commons
transport build. The methods are located in
org.apache.axis2.format.ElementHelper in the axis2-transport-base
module.

Andreas

On Thu, Mar 12, 2009 at 00:06, Kim Horn <[email protected]> wrote:
> Hello Andreas,
> This is great and really helps, have not had time to try it out but will soon.
>
> Contributing the java.io.Reader would be a great help but it will take me a 
> while to get up to speed to do the Synapse iterator.
>
> In the short term I am going to use a brute force approach that is now 
> feasible given the memory issue is resolved. Just thought of this one today. 
> Use VFS proxy to FTP file locally; so streaming helps here. A POJOCommand on 
> <out> to split file into another directory, stream in and out. Another 
> independent VFS proxy watches that directory and submits each file to Web 
> service. Hopefully memory will be fine. Overloading the destination may still 
> be an issue ?
>
> Kim
>
>
>
> -----Original Message-----
> From: Andreas Veithen [mailto:[email protected]]
> Sent: Monday, 9 March 2009 10:55 PM
> To: [email protected]
> Subject: Re: VFS - Synapse Memory Leak
>
> The changes I did in the VFS transport and the message builders for
> text/plain and application/octet-stream certainly don't provide an
> out-of-the-box solution for your use case, but they are the
> prerequisite.
>
> Concerning your first proposed solution (let the VFS write the content
> to a temporary file), I don't like this because it would create a
> tight coupling between the VFS transport and the mediator. A design
> goal should be that the solution will still work if the file comes
> from another source, e.g. an attachment in an MTOM or SwA message.
>
> I thing that an all-Synapse solution (2 or 3) should be possible, but
> this will require development of a custom mediator. This mediator
> would read the content, split it up (and store the chunks in memory or
> an disk) and executes a sub-sequence for each chunk. The execution of
> the sub-sequence would happen synchronously to limit the memory/disk
> space consumption (to the maximum chunk size) and to avoid flooding
> the destination service.
>
> Note that it is probably not possible to implemented the mediator
> using a script because of the problematic String handling. Also,
> Spring, POJO and class mediators don't support sub-sequences (I
> think). Therefore it should be implemented as a full-featured Java
> mediator, probably taking the existing iterate mediator as a template.
> I can contribute the required code to get the text content in the form
> of a java.io.Reader.
>
> Regards,
>
> Andreas
>
> On Mon, Mar 9, 2009 at 03:05, kimhorn <[email protected]> wrote:
>>
>> Although this is a good feature it may not solve the actual problem ?
>> The main first issue on my list was the memory leak.
>> However, the real problem is once I get this massive files I  have to send
>> it to a web Service that can only take it in small chunks (about 14MB) .
>> Streaming it straight out would just kill the destination Web service. It
>> would get the memory error. The text document can be split apart easily, as
>> it has independant records on each line seperated by <CR> <LF>.
>>
>> In an earlier post; that was not responded too, I mentioned:
>>
>> "Otherwise; for large EDI files a VFS iterator Mediator that streams through
>> input file and outputs smaller
>> chunks for processing, in Synapse, may be a solution ? "
>>
>> So I had mentioned a few solutions, in prior posts, solution now are:
>>
>> 1) VFS writes straight to temporary file, then a Java mediator can process
>> the file by splitting it into many smaller files. These files then trigger
>> another VFS proxy that submits these to the final web Service.
>> The problem is is that is uses the file system (not so bad).
>> 2) A Java Mediator takes the <text> package and splits it up by wrapping
>> into many XML <data> elements that can then be acted on by a Synapse
>> Iterator. So replace the text message with many smaller XML elements.
>> Problem is that this loads whole message into memory.
>> 3) Create another Iterator in Synapse that works on Regular expression (to
>> split the text data) or actually uses a for loop approach to chop the file
>> into chunks based on the loop index value. E.g. Index = 23 means a 14K chunk
>> 23 chunks into the data.
>> 4) Using the approach proposed now - just submit the file straight (stream
>> it) to another web service that chops it up. It may return an XML document
>> with many sub elelements that allows the standard Iterator to work. Similar
>> to (2) but using another service rather than Java to split document.
>> 5) Using the approach proposed now - just submit the file straight (stream
>> it) to another web service that chops it up but calls a Synapse proxy with
>> each small packet of data that then forwards it to the final WEb Service. So
>> the Web Service iterates across the data; and not Synapse.
>>
>> Then other solutions replace Synapse with a stand alone Java program at the
>> front end.
>>
>> Another issue here is throttling: Splitting the file is one issues but
>> submitting 100's of calls in parralel to the destination service would
>> result in time outs... So need to work in throttling.
>>
>>
>>
>>
>>
>>
>>
>>
>> Ruwan Linton wrote:
>>>
>>> I agree and can understand the time factor and also +1 for reusing stuff
>>> than trying to invent the wheel again :-)
>>>
>>> Thanks,
>>> Ruwan
>>>
>>> On Sun, Mar 8, 2009 at 4:08 PM, Andreas Veithen
>>> <[email protected]>wrote:
>>>
>>>> Ruwan,
>>>>
>>>> It's not a question of possibility, it is a question of available time
>>>> :-)
>>>>
>>>> Also note that some of the features that we might want to implement
>>>> have some similarities with what is done for attachments in Axiom
>>>> (except that an attachment is only available once, while a file over
>>>> VFS can be read several times). I think there is also some existing
>>>> code in Axis2 that might be useful. We should not reimplement these
>>>> things but try to make the existing code reusable. This however is
>>>> only realistic for the next release after 1.3.
>>>>
>>>> Andreas
>>>>
>>>> On Sun, Mar 8, 2009 at 03:47, Ruwan Linton <[email protected]>
>>>> wrote:
>>>> > Andreas,
>>>> >
>>>> > Can we have the caching at the file system as a property to support the
>>>> > multiple layers touching the full message and is it possible make it to
>>>> > specify a threshold for streaming? For example if the message is
>>>> touched
>>>> > several time we might still need streaming but not for the 100KB or
>>>> lesser
>>>> > files.
>>>> >
>>>> > Thanks,
>>>> > Ruwan
>>>> >
>>>> > On Sun, Mar 8, 2009 at 1:12 AM, Andreas Veithen <
>>>> [email protected]>
>>>> > wrote:
>>>> >>
>>>> >> I've done an initial implementation of this feature. It is available
>>>> >> in trunk and should be included in the next nightly build. In order to
>>>> >> enable this in your configuration, you need to add the following
>>>> >> property to the proxy:
>>>> >>
>>>> >> <parameter name="transport.vfs.Streaming">true</parameter>
>>>> >>
>>>> >> You also need to add the following mediators just before the <send>
>>>> >> mediator:
>>>> >>
>>>> >> <property action="remove" name="transportNonBlocking" scope="axis2"/>
>>>> >> <property action="set" name="OUT_ONLY" value="true"/>
>>>> >>
>>>> >> With this configuration Synapse will stream the data directly from the
>>>> >> incoming to the outgoing transport without storing it in memory or in
>>>> >> a temporary file. Note that this has two other side effects:
>>>> >> * The incoming file (or connection in case of a remote file) will only
>>>> >> be opened on demand. In this case this happens during execution of the
>>>> >> <send> mediator.
>>>> >> * If during the mediation the content of the file is needed several
>>>> >> time (which is not the case in your example), it will be read several
>>>> >> times. The reason is of course that the content is not cached.
>>>> >>
>>>> >> I tested the solution with a 2GB file and it worked fine. The
>>>> >> performance of the implementation is not yet optimal, but at least the
>>>> >> memory consumption is constant.
>>>> >>
>>>> >> Some additional comments:
>>>> >> * The transport.vfs.Streaming property has no impact on XML and SOAP
>>>> >> processing: this type of content is processed exactly as before.
>>>> >> * With the changes described here, we have now two different policies
>>>> >> for plain text and binary content processing: in-memory caching + no
>>>> >> streaming (transport.vfs.Streaming=false) and no caching + deferred
>>>> >> connection + streaming (transport.vfs.Streaming=true). Probably we
>>>> >> should define a wider range of policies in the future, including file
>>>> >> system caching + streaming.
>>>> >> * It is necessary to remove the transportNonBlocking property
>>>> >> (MessageContext.TRANSPORT_NON_BLOCKING) to prevent the <send> mediator
>>>> >> (more precisely the OperationClient) from executing the outgoing
>>>> >> transport in a separate thread. This property is set by the incoming
>>>> >> transport. I think this is a bug since I don't see any valid reason
>>>> >> why the transport that handles the incoming request should determine
>>>> >> the threading behavior of the transport that sends the outgoing
>>>> >> request to the target service. Maybe Asankha can comment on this?
>>>> >>
>>>> >> Andreas
>>>> >>
>>>> >> On Thu, Mar 5, 2009 at 07:21, kimhorn <[email protected]> wrote:
>>>> >> >
>>>> >> > Thats good; as this stops us using Synapse.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > Asankha C. Perera wrote:
>>>> >> >>
>>>> >> >>
>>>> >> >>> Exception in thread "vfs-Worker-4" java.lang.OutOfMemoryError:
>>>> Java
>>>> >> >>> heap
>>>> >> >>> space
>>>> >> >>>         at
>>>> >> >>>
>>>> >> >>>
>>>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
>>>> >> >>>         at
>>>> >> >>>
>>>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
>>>> >> >>>         at java.lang.StringBuffer.append(StringBuffer.java:307)
>>>> >> >>>         at java.io.StringWriter.write(StringWriter.java:72)
>>>> >> >>>         at
>>>> org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1129)
>>>> >> >>>         at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104)
>>>> >> >>>         at org.apache.commons.io.IOUtils.copy(IOUtils.java:1078)
>>>> >> >>>         at
>>>> org.apache.commons.io.IOUtils.toString(IOUtils.java:382)
>>>> >> >>>         at
>>>> >> >>>
>>>> >> >>>
>>>> org.apache.synapse.format.PlainTextBuilder.processDocument(PlainTextBuilder.java:68)
>>>> >> >>>
>>>> >> >> Since the content type is text, the plain text formatter is trying
>>>> to
>>>> >> >> use a String to parse as I see.. which is a problem for large
>>>> content..
>>>> >> >>
>>>> >> >> A definite bug we need to fix ..
>>>> >> >>
>>>> >> >> cheers
>>>> >> >> asankha
>>>> >> >>
>>>> >> >> --
>>>> >> >> Asankha C. Perera
>>>> >> >> AdroitLogic, http://adroitlogic.org
>>>> >> >>
>>>> >> >> http://esbmagic.blogspot.com
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> ---------------------------------------------------------------------
>>>> >> >> To unsubscribe, e-mail: [email protected]
>>>> >> >> For additional commands, e-mail: [email protected]
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >> > --
>>>> >> > View this message in context:
>>>> >> >
>>>> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22345904.html
>>>> >> > Sent from the Synapse - Dev mailing list archive at Nabble.com.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> ---------------------------------------------------------------------
>>>> >> > To unsubscribe, e-mail: [email protected]
>>>> >> > For additional commands, e-mail: [email protected]
>>>> >> >
>>>> >> >
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe, e-mail: [email protected]
>>>> >> For additional commands, e-mail: [email protected]
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Ruwan Linton
>>>> > http://wso2.org - "Oxygenating the Web Services Platform"
>>>> > http://ruwansblog.blogspot.com/
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>>
>>> --
>>> Ruwan Linton
>>> http://wso2.org - "Oxygenating the Web Services Platform"
>>> http://ruwansblog.blogspot.com/
>>>
>>>
>>
>> --
>> View this message in context: 
>> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22405973.html
>> Sent from the Synapse - Dev mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to