Chris,
here are my brief opinions. > Committers are invited by the current group of active participants. The > best way to be invited is to become active in the community (i.e. this > mailing list and/or the us...@tomcat.apache.org mailing list), and > submit patches. > thanks for providing the right info. > If you have a specific patch you think would be useful, file an > enhancement request in Bugzilla and attach your patch to it. If it's > useful, someone will apply it and give you credit. > > I'm interested in how you are able to obtain a "5x speed improvement > over commons file-upload": the slowest link in the chain is the network > which you can't fix with software (other than compression). I'm unclear > as to why you think Boyer Moore string searching will be measurably > faster than simple String.indexOf because the search strings (the > multipart boundaries, usually only about 64 bytes) are so small. > why BoyerMoore algorithm is faster then simple String.indexOf search, you can reference the wiki page, http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm in fastupload, the architecture is more simple than commons file upload. also, fast upload requires java 5 or high version. so commons file upload cannot be fixed with the same way. in fact, BoyerMoore string search algorithm is open. I research the algorithm and found that it is the right algorithm to find a random character in text. A un-titled author write the java implementation of it in Wiki. I made a bit enhancement of the implementation, to enable it has the ability search content in java bytes. the source code named "BoyerMoore.java" in the fast upload project to give the copyright to Boyer and Moore. where ever, BoyerMoore can search any java bytes. Reading whole bytes of ServletInputStream buffer is not required. In the case, reading some bytes from ServletInputStream and find boundary from the bytes, it did good jobs well. if you're interested it. please reference the source code StreamUploaderParser.java in fast upload source. compare with commons file upload and Cosz upload component, only fast upload component provides the resolution that parse a part data of Multipart data represent a uploading file. and write the data into a file. the resolution can reduce the memory cost when parsing a large size of file. > Also, I think the use of Boyer Moore is naïve, as it will require you to > read a whole multipart part into memory before searching for the > boundary and disassembling the parts. > > Finally, you ignore an opportunity to further improve your algorithm > because the multipart boundary does not change from part to part: you > can cache the charset and offset tables for the multipart boundary for > the entire request instead of re-creating them each time you search. Exactly! since fast upload 0.3.5 release, the plan includes the enhancement. > But > then you'd have to understand the algorithm instead of just copy/pasting > from Wikipedia. At least change some of the Javadoc formatting if you > are going to steal other people's work. Otherwise, give them credit. On Sep 25, 2012, at 11:40 PM, Christopher Schultz <ch...@christopherschultz.net> wrote: > Link, > > On 9/25/12 10:14 AM, Fastupload wrote: >> What's the right org that I can apply a commuter account of apache >> open source project? > > Committers are invited by the current group of active participants. The > best way to be invited is to become active in the community (i.e. this > mailing list and/or the us...@tomcat.apache.org mailing list), and > submit patches. > > If you have a specific patch you think would be useful, file an > enhancement request in Bugzilla and attach your patch to it. If it's > useful, someone will apply it and give you credit. > > I'm interested in how you are able to obtain a "5x speed improvement > over commons file-upload": the slowest link in the chain is the network > which you can't fix with software (other than compression). I'm unclear > as to why you think Boyer Moore string searching will be measurably > faster than simple String.indexOf because the search strings (the > multipart boundaries, usually only about 64 bytes) are so small. > > Also, I think the use of Boyer Moore is naïve, as it will require you to > read a whole multipart part into memory before searching for the > boundary and disassembling the parts. > > Finally, you ignore an opportunity to further improve your algorithm > because the multipart boundary does not change from part to part: you > can cache the charset and offset tables for the multipart boundary for > the entire request instead of re-creating them each time you search. But > then you'd have to understand the algorithm instead of just copy/pasting > from Wikipedia. At least change some of the Javadoc formatting if you > are going to steal other people's work. Otherwise, give them credit. > > -chris >