[ 
https://issues.apache.org/jira/browse/HADOOP-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688568#action_12688568
 ] 

Chris Douglas commented on HADOOP-5553:
---------------------------------------

I'll try to explain my position.

The SequenceFile.Reader instance creates the instance passed to nextRawValue. 
It is contrary to its design to pass an arbitrary ValueBytes object in, as it 
cannot be guaranteed that it observes the semantics the Reader expects (which 
is why the instance is cast to a subclass in nextRaw\*). If exchanging 
ValueBytes instances between Readers is not guaranteed, then supporting more 
general user code certainly should not be.

Consider how much wider the interface becomes when user code is permitted. 
Right now, SequenceFile.Reader is threadsafe, because the full record is 
consumed in nextRaw. If a lazy reader were to block that stream until the full 
value were consumed, it would introduce the possibility of deadlock (if it 
didn't, its results would be undefined). Lazy vbytes A might block, while lazy 
vbytes B may page to disk if there's contention. Threads might use a mix of 
ValueBytes instances into the same Reader, some written by the user, others 
from library code. In my mind, lazily loading ValueBytes is a worthy feature 
for SequenceFile.Reader, not one of a set of possible extensions to a binary 
interface. Again, I can't think of a second one.

As an alternative, consider implementing {{createValueBytes(boolean lazy)}}, 
returning a ValueBytes instance that lazily reads the value. This permits 
SequenceFile to initialize any locks/state necessary to support this, keeps the 
contract contained in SequenceFile, and adds only one parameter to an advanced 
interface.

> Change modifier of SequenceFile.CompressedBytes and 
> SequenceFile.UncompressedBytes from private to public
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5553
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5553
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5553-2.patch, Hadoop-5553-3.patch, 
> Hadoop-5553.patch
>
>
> SequenceFile.rawValue() provides the only interface to navigate the 
> underlying bytes. And with some little work on implementing a customized 
> ValueBytes can avoid reading all bytes into memory. Unfortunately, the 
> current nextRawValue will cast the passing ValueBytes to either private class 
> CompressedBytes or private class UnCompressedBytes, this will disallow user 
> further extension.
> I can not see any reason that CompressedBytes and UnCompressedBytes should be 
> set to private. And since the ValueBytes is public and nextValue() casts it 
> to either CompressedBytes or UnCompressedBytes, i think it would be better if 
> they are public.
> I am stuck now by this issue, really appracited if this got resolved as soon 
> as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to