Re: to improve the performance of form-based upload for Tomcat 7

Fastupload Tue, 25 Sep 2012 23:57:50 -0700

Chris,


here are my brief opinions.

> Committers are invited by the current group of active participants. The
> best way to be invited is to become active in the community (i.e. this
> mailing list and/or the us...@tomcat.apache.org mailing list), and
> submit patches.
> 
thanks for providing the right info.


> If you have a specific patch you think would be useful, file an
> enhancement request in Bugzilla and attach your patch to it. If it's
> useful, someone will apply it and give you credit.
> 
> I'm interested in how you are able to obtain a "5x speed improvement
> over commons file-upload": the slowest link in the chain is the network
> which you can't fix with software (other than compression). I'm unclear
> as to why you think Boyer Moore string searching will be measurably
> faster than simple String.indexOf because the search strings (the
> multipart boundaries, usually only about 64 bytes) are so small.
> 
why BoyerMoore algorithm is faster then simple String.indexOf search, you can 
reference the wiki page, 
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

in fastupload, the architecture is more simple than commons file upload. also, 
fast upload requires java 5 or high version. so commons file upload cannot be 
fixed with the same way.

in fact, BoyerMoore string search algorithm is open. I research the algorithm 
and found that it is the right algorithm to find a random character in text. 
A un-titled author write the java implementation of it in Wiki. I made a bit 
enhancement of the implementation, to enable it has the ability search  content 
in java bytes.  the source code named "BoyerMoore.java"  in the fast upload 
project to give the copyright to Boyer and Moore.

where ever, BoyerMoore can search any java bytes.  Reading whole bytes of 
ServletInputStream buffer is not required.  In the case, reading some bytes 
from ServletInputStream and find boundary  from the bytes, it did good jobs 
well. if you're interested it. please reference the source code 
StreamUploaderParser.java in fast upload source. 

compare with commons file upload and Cosz upload component, only fast upload 
component provides the resolution that parse a part data of Multipart data 
represent a uploading file. and write the data into a file.  the resolution can 
reduce the memory cost when parsing a large size of file.

> Also, I think the use of Boyer Moore is naïve, as it will require you to
> read a whole multipart part into memory before searching for the
> boundary and disassembling the parts.
> 
> Finally, you ignore an opportunity to further improve your algorithm
> because the multipart boundary does not change from part to part: you
> can cache the charset and offset tables for the multipart boundary for
> the entire request instead of re-creating them each time you search.
Exactly! since fast upload 0.3.5 release, the plan includes the enhancement. 

> But
> then you'd have to understand the algorithm instead of just copy/pasting
> from Wikipedia. At least change some of the Javadoc formatting if you
> are going to steal other people's work. Otherwise, give them credit.





On Sep 25, 2012, at 11:40 PM, Christopher Schultz 
<ch...@christopherschultz.net> wrote:

> Link,
> 
> On 9/25/12 10:14 AM, Fastupload wrote:
>> What's the right  org  that I can apply a commuter account of apache
>> open source project?
> 
> Committers are invited by the current group of active participants. The
> best way to be invited is to become active in the community (i.e. this
> mailing list and/or the us...@tomcat.apache.org mailing list), and
> submit patches.
> 
> If you have a specific patch you think would be useful, file an
> enhancement request in Bugzilla and attach your patch to it. If it's
> useful, someone will apply it and give you credit.
> 
> I'm interested in how you are able to obtain a "5x speed improvement
> over commons file-upload": the slowest link in the chain is the network
> which you can't fix with software (other than compression). I'm unclear
> as to why you think Boyer Moore string searching will be measurably
> faster than simple String.indexOf because the search strings (the
> multipart boundaries, usually only about 64 bytes) are so small.
> 
> Also, I think the use of Boyer Moore is naïve, as it will require you to
> read a whole multipart part into memory before searching for the
> boundary and disassembling the parts.
> 
> Finally, you ignore an opportunity to further improve your algorithm
> because the multipart boundary does not change from part to part: you
> can cache the charset and offset tables for the multipart boundary for
> the entire request instead of re-creating them each time you search. But
> then you'd have to understand the algorithm instead of just copy/pasting
> from Wikipedia. At least change some of the Javadoc formatting if you
> are going to steal other people's work. Otherwise, give them credit.
> 
> -chris
>

Re: to improve the performance of form-based upload for Tomcat 7

Reply via email to