[ 
https://issues.apache.org/jira/browse/CRUNCH-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512638#comment-14512638
 ] 

Josh Wills commented on CRUNCH-513:
-----------------------------------

Hey [~anelson425], I don't quite get what's going on here-- what do the child 
paths look like that they don't get picked up by either a) the glob or b) the 
isDir() check that processes the paths that are found by the glob? Is there 
another check we could add like the isDir() one that would pick them up?

> HFileSource not calculating size correctly for nested pathes
> ------------------------------------------------------------
>
>                 Key: CRUNCH-513
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-513
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Andy Nelson
>         Attachments: Crunch-513.patch
>
>
> The cause of this is that getInternalSize[1] does not traverse the child 
> paths to determine the size. 
> I have the fix in a patch that I will attach but I have not been able to 
> successfully append to the integration tests to see this failure. This issue 
> only appears to be a problem when using the DistributedFileSystem but the 
> tests for HFileSource use RawLocalFileSystem. I see there are additional 
> tests that use the hadoop mini cluster, but I was not able to implement 
> correctly.
> [1] 
> https://github.com/apache/crunch/blob/apache-crunch-0.8.3/crunch-hbase/src/main/java/org/apache/crunch/io/hbase/HFileSource.java#L116



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to