Delayed entry parsing Was: [classlib][luni] optimized InputStream reader (HARMONY-5522)

Alexei Fedotov Tue, 18 Mar 2008 03:58:50 -0700

Hello folks,

I was thinking about delaying parsing of Manifest entries to get even
more speed up while merging Tim's inlinable ByteStream (it is called
ByteSnapshot now) and faced a design choice. Manifest#getEntries() is
not reported to throw IOException. It means that if I delay parsing
entries till this call and get a parse error, I can no longer throw
conventional IOException.


What should I do?
1. Throw new RuntimeException
2. Set all entries to null (and probably get NPE very soon)
3. Ignore a particular attribute (name: value) with a parsing error as
specification might be suggesting [1]
4. Parse entries during read() method / constructor invocation as RI does.

Suggestions?

[1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html, search for
"Attributes which are not understood are ignored."

On Fri, Mar 14, 2008 at 8:30 PM, Alexei Fedotov
<[EMAIL PROTECTED]> wrote:
> I just want to have the following written somewhere until I forget: in
>  one of hot spots Eclipse guys parse the whole manifest to get two main
>  attributes, and then drop the object. There is no need to parse 100k
>  of ICU manifst to populate entry signatures for that use case. In
>  other words one may think of delaying parsing of other sections then
>  the main one.
>
>  On Fri, Mar 14, 2008 at 8:22 PM, Alexei Fedotov
>
> <[EMAIL PROTECTED]> wrote:
>
>
> > Tim, thank you for reading and understanding. I like when my code
>  >  passes a review and I am eager to try arguments I've created for
>  >  myself. I thought of your variant and liked it because it would be
>  >  possible to JIT to inline your version of ByteStream.readFully()
>  >  method.
>  >
>  >  There is some complication caused with a public API exposed by
>  >  Manifest class: the constructor and read methods accept a general
>  >  InputStream as a source for Manifest. Other applications, such as
>  >  Eclipse, read a file themselves and then call Manifest API directly to
>  >  get few main attributes. There is a better chance that external
>  >  developers would use use documented ByteArrayInputStream to wrap their
>  >  bytes than "our possibly optimized internal child" of that class.
>  >
>  >  I also believe that I decrease overall project astonishment by
>  >  encapsulating more complex technique in one place instead of spreading
>  >  less conventional things around. Reflection is hidden in a class with
>  >  conventional API, and no one would question when reading
>  >  ZipFile#getInputStream(ZipEntry) code why he should wrap a resulted
>  >  byte buffer in "our possibly optimized internal child" instead of
>  >  ByteArrayInputStream.
>  >
>  >  Finally two these methods may be combined: in addition to
>  >  ByteArrayInputStream one may check in readFully(InputStream) if "our
>  >  possibly optimized internal child" is supplied an use a quick path
>  >  without reflection. I wonder what you and other java gurus think about
>  >  this.
>  >
>  >
>  >
>  >
>  >  On Fri, Mar 14, 2008 at 7:47 PM, Tim Ellison <[EMAIL PROTECTED]> wrote:
>  >  > Alexei Fedotov wrote:
>  >  >  > Let me write something more understandable. Breathe, Alexei, breathe 
> slowly.
>  >  >
>  >  >  Code is good ;-)
>  >  >
>  >  >
>  >  >  > 1. Our native code returns us Zip entries in a form of a byte array
>  >  >  > which is later wrapped in ByteArrayInputSteam object.
>  >  >
>  >  >  So this is my point.  While I see that you are writing the ByteStream
>  >  >  util to reach into a regular ByteArrayInputStream to get the backing
>  >  >  array -- my question is why not change the
>  >  >  ZipFile#getInputStream(ZipEntry) implementation to return a ByteStream
>  >  >  (which will be a subclass of ByteArrayInputStream that will simply
>  >  >  return the backing array when asked without reflection).
>  >  >
>  >  >  I agree with the rest of the arguments below.
>  >  >
>  >  >  Regards,
>  >  >  Tim
>  >  >
>  >  >
>  >  >
>  >  >  > 2. Instead of copying portions of this ByteArrayInputSteam via
>  >  >  > read(buffer) call it is quicker to get a reference to the underlying
>  >  >  > buffer using reflection and work with this array.
>  >  >  > 3. This big buffer is useful for further manifest verification. This
>  >  >  > is a subject of separate [1]. Instead of copying manifest sections
>  >  >  > into smaller byte buffers, we maintain a list of relative positions 
> in
>  >  >  > the big buffer, and update the digests using these positions.
>  >  >  > Generally, storing two relative positions is much cheaper than
>  >  >  > allocating memory and populating it with bytes. That is why I refer 
> to
>  >  >  > this change as to implementation cleanup, and not an optimization.
>  >  >  > 4.  I wrote ideas of possible further cleanup in TODO comments. They
>  >  >  > could be described as follows: I removed two of six copies (acts of
>  >  >  > copying) of a manifest byte before it becomes accessible as a key or
>  >  >  > value in a java hash table. Another two copies are easy to be 
> removed,
>  >  >  > so there is enough place for improvement here.
>  >  >  >
>  >  >  > If we would only know the algorithm which will be used for digesting,
>  >  >  > we may replace a manifest buffer with few signatures which might be
>  >  >  > updated while manifest is read. After algorithm is known the manifest
>  >  >  > may be verified in one pass.
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  > On Thu, Mar 13, 2008 at 6:03 PM, Tim Ellison <[EMAIL PROTECTED]> 
> wrote:
>  >  >  >> Alexei Fedotov wrote:
>  >  >  >>  > I see the question. The general InoouttStream cannot be 
> optimized, but
>  >  >  >>  > our implementation returns ByteArrayInputStream. We can choose a 
> fast
>  >  >  >>  > path without using arraycopy  if we are first to read from this
>  >  >  >>  > ByteArrayInputStream object. We just get get a reference to the
>  >  >  >>  > underlying buffer and return it without copying the contents. 
> This
>  >  >  >>  > works only for unmodified ByteArrayInputStream object, but this 
> is
>  >  >  >>  > exactly our case.
>  >  >  >>
>  >  >  >>  Yep, I see that.  Now how will you use it (i.e. can you show the 
> code
>  >  >  >>  that references ByteStream)?  I can see where you are going but 
> want to
>  >  >  >>  see the whole picture.
>  >  >  >>
>  >  >  >>  Regards,
>  >  >  >>  Tim
>  >  >  >>
>  >  >  >>
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >
>  >
>  >
>  >
>  >  --
>  >  With best regards,
>  >  Alexei
>  >
>
>
>
>  --
>  With best regards,
>  Alexei
>



-- 
With best regards,
Alexei

Delayed entry parsing Was: [classlib][luni] optimized InputStream reader (HARMONY-5522)

Reply via email to