[ 
https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223927#comment-13223927
 ] 

jirapos...@reviews.apache.org commented on HDFS-2834:
-----------------------------------------------------



bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > some stuff around error cases here -- I think you'd run into these bugs 
if you hit a checksum error, since the "retry on different node" path would 
trigger in DFSInputStream, but the target buffer's position would be 
prematurely updated from prior attempts.
bq.  > 
bq.  > But getting close!
bq.  > 
bq.  > Is it possible to split out the test refactor change from this one? Hard 
to tell what's changed in the tests vs just refactored. If it's a pain in the 
butt I'll look more closely.

It's not super easy, unfortunately - let me know if it's a big problem and I'll 
perform the git surgery required (or provide annotations on the review, if that 
would help).

Sounds like I need to write tests against a file with a malformed checksum; 
I'll post another patch when I'm done. 

I'll respond to the libhdfs changes in a separate update. 


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java,
 line 491
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88647#file88647line491>
bq.  >
bq.  >     do you really need synchronization here and below? it seems like the 
strategy is being called from a synchronized context, so this is redundant. 
Removing this also allows the inner classes to be static which I think is 
preferable

The inner classes depend on blockReader, which is why they're non-static. The 
synchronisation is (hopefully) to calm Findbugs since it gives a false positive 
when blockReader is perhaps accessed from a non-synchronized context (although 
the strategy classes are always called from a method which already has the 
lock).

We can kill two birds with one stone by making the classes static and passing 
blockReader as a parameter either at construction or read-time; findBugs 
probably won't follow the aliasing and so won't try and understand the locks.


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 lines 268-270
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line268>
bq.  >
bq.  >     I envision some silly user setting the buffer size < 
bytesPerChecksum and getting screwed here. Worth a check for valid range here 
(throwing IOE or IllegalArgumentException or something)

Hah, I was just going to let them have 0-byte reads :) Yes, throwing is a 
better idea. 


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 lines 123-133
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line123>
bq.  >
bq.  >     please use javadoc style comments for these, rather than //s

Done.


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java,
 line 22
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88644#file88644line22>
bq.  >
bq.  >     unused import?

Good catch. 


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 line 357
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line357>
bq.  >
bq.  >     should this line go in a finally{} clause? Otherwise if 'to' doesn't 
have enough space for the copy, we'd end up leaving 'from' with the modified 
limit

The callers are supposed to guarantee that 'to' has enough space, but it does 
no harm to be defensive. 


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 line 397
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line397>
bq.  >
bq.  >     in finally{} clause perhaps, so the limit isn't messed up by a 
failed read

Done.


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 line 459
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line459>
bq.  >
bq.  >     the error conditions here don't quite work right.
bq.  >     For example, if a checksum error occurs, the buffer's position will 
still be updated.
bq.  >     
bq.  >     Above, there's a similar problem if "phase 1" succeeds but "phase 2" 
encounters an error -- the position will change but the method will throw.

I think it might make sense for the client of BlockReaderLocal to do the 
cleanup, since they might have to do some reset logic of their own anyhow. In 
particular, DFSInputStream seeks back to the beginning of the read. We might as 
well keep all the compensation logic in the same place, and DFSInputStream is 
best placed to know what it wants to do. Plus it's a bit easier to wrap a 
single read call in a try / catch block than to try and reason about individual 
failure modes inside the read method. Does that sound sensible?


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 lines 300-301
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line300>
bq.  >
bq.  >     This isn't your bug, but I just realized there's a potential leak 
here if the skip() calls below failed. I think we need to add a try..finally{} 
which returns these buffers to the pool if construction fails

Done.


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 lines 517-521
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line517>
bq.  >
bq.  >     no '-'s needed in javadoc

Done.


- Henry


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4212/#review5665
-----------------------------------------------------------


On 2012-03-06 23:14:07, Henry Robinson wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4212/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-06 23:14:07)
bq.  
bq.  
bq.  Review request for hadoop-hdfs and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  New patch for HDFS-2834 (I can't update the old review request).
bq.  
bq.  
bq.  This addresses bug HDFS-2834.
bq.      http://issues.apache.org/jira/browse/HDFS-2834
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
 dfab730 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
 cc61697 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
 4187f1c 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
 2b817ff 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
 b7da8d4 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
 ea24777 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
 9d4f4a2 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java
 PRE-CREATION 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
 bbd0012 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
 PRE-CREATION 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
 eb2a1d8 
bq.  
bq.  Diff: https://reviews.apache.org/r/4212/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Henry
bq.  
bq.


                
> ByteBuffer-based read API for DFSInputStream
> --------------------------------------------
>
>                 Key: HDFS-2834
>                 URL: https://issues.apache.org/jira/browse/HDFS-2834
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>         Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, 
> HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.7.patch, 
> HDFS-2834.8.patch, HDFS-2834.patch, HDFS-2834.patch, 
> hdfs-2834-libhdfs-benchmark.png
>
>
> The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated 
> {{byte[]}}. Although for many clients this is desired behaviour, in certain 
> situations, such as native-reads through libhdfs, this imposes an extra copy 
> penalty since the {{byte[]}} needs to be copied out again into a natively 
> readable memory area. 
> For these cases, it would be preferable to allow the client to supply its own 
> buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to