Re: Caching support for large attachments

Senaka Fernando Sun, 16 Mar 2008 05:24:13 -0700

Hi Samisa,

IIRC, this discussion is on handling attachments and thus, does not relate
to caching. Though $subject says "Caching" what actually was discussed was
a mechanism to buffer the attachment in a file, rather than in memory, and
that buffer has nothing to do with a Caching, which is a totally different
concept, as in [1].


The previous mail I sent was a reply to Manjula's concern in handling a
scenario where the MIME boundary appears as two parts distributed among
two reads. As unlike the previous scenarios, the once read block will be
flushed to a file, instead of having it in memory. Thus, parsing may have
to be thought of. Sorry if it confused you.

IMHO, writing a partially parsed buffer to a file is not that efficient as
we will have to parse it sometime later, to discover MIME Boundaries and
extract attachments. Thus, I still believe that realtime buffering to a
file while parsing is still a better choice. To implement such, we will
have to modify our mime_parser.c, and probably the data_handler
implementation.

Or if not, am I misunderstanding $subject?

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html

Regards,
Senaka

> Senaka Fernando wrote:
>>> Hi Manjula, Thilina and others,
>>>
>>> Yep, I think I'm exactly in the same view point as Thilina when it
>>> comes
>>> to handling attachment data. Well for the chunking part. I think I
>>> didn't
>>> get Thilina right in his first e-mail.
>>>
>>> And, However, the file per MIME part may not always be optimal. I say
>>> rather  each file should have a fixed Max Size and if that is exceeded
>>> perhaps you can divide it to two. Also a user should always be given
>>> the
>>> option to choose between Thilina's method and this method through the
>>> axis2.xml (or services.xml). Thus, a user can fine tune memory use.
>>>
>>> When it comes to base64 encoded binary data, you can use a mechanism
>>> where
>>> the buffer would always have the size which is a multiple of 4, and
>>> then
>>> when you flush you decode it and copy it to the file, so that should
>>> essentially be the same to a user when it comes to caching.
>>>
>>> OK, so Manjula, you mean when the MIME boundary appears partially in
>>> the
>>> first read and partially in the second?
>>>
>>> Well this is probably the best solution.
>>>
>>> You will allocate enough size to read twice the size of a MIME boundary
>>> and in your very first read, you will read 2 times the MIME boundary,
>>> then
>>> you will search for the existence of the MIME boundary. Next you will
>>> do a
>>> memmove() and move all the contents of the buffer starting from the
>>> MidPoint until the end, to the beginning of the buffer. After doing
>>> this,
>>> you will read a size equivalent to 1/2 the buffer (which again is the
>>> size
>>> of the MIME boundary marker) and store it from the Mid Point of the
>>> buffer
>>> to the end. Then you will search again. You will iterate this procedure
>>> until you read less than half the size of the buffer.
>>>
>>
>> If you are interested further in this mechanism, I used this approach
>> when
>> it comes to resending Binary data using TCPMon. You may check that also.
>>
>> Also, the strstr() has issues when you have '\0' in the middle. Thus you
>> will have to use a temporary search marker and use that in the process.
>> Before calling strstr() you will check whether strlen(temp) is greater
>> than the MIME boundary marker or equal. If it is greater, you only need
>> to
>> search once. If it is equal, you will need to search exactly twice. If
>> it
>> is less you increment temp by strlen(temp) and repeat until you cross
>> the
>> Midpoint. So this makes the search even efficient.
>>
>> If you want to make the search even efficient, you can make the buffer
>> size one less than the size of the MIME boundary marker, so when you get
>> the equals scenario, you will have to search only once.
>>
>> The fact I've used here is that strstr and strlen behaves the same in a
>> given implementation. In Windows if strlen() is multibyte aware, so will
>> strstr(). So, no worries.
>>
>
> We have an efficient parsing mechanism already, tested and proven to
> work, with 1.3. Why on earth are we discussing this over and over again?
>
> Does caching get affected by the mime parser logic? IMHO no. They are
> two separate concerns, so ahy are we wasting time discussing parsing
> while the problem at had is not parsing but caching?
>
> Writing the partially passed buffer was a solution to caching. Do we
> have any other alternatives? If so what, in short, what are they?
>
> Samisa...
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Caching support for large attachments

Reply via email to