[ 
https://issues.apache.org/jira/browse/ACCUMULO-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274819#comment-13274819
 ] 

Keith Turner commented on ACCUMULO-456:
---------------------------------------

The procedure I pointed out earlier w/ compacting the table is something that a 
user could do now w/ existing code.  For future code changes I think 
generalizing the chop compaction used by merge would be a good thing to do.  
This way only the files that needs to be compacted are compacted, it minimizes 
the amount of decompression, deserialization, serialization, and compression 
that needs to be done.  I think chop+distcp is a good way to go.  distcp is a 
well tested tool that copies bytewise and does not decompress, etc.  The 
identity map reduce operation suggested above would be more efficient when all 
files need to be chopped, but I am not sure this will be the usual case.  When 
only a small number of files need to be chopped the identity map reduce will 
result in a lot more CPU load than chop+distcp.  I suppose the ultimiate 
optimization is a map reduce job that copies bytewise when no chop is needed 
and does the chop as part of the map reduce job when needed.  This would be a 
fairly complex bit of code that may not get the testing it needs.

Making bulk import handle multiple dirs would be a nice convenience feature for 
users.  At the moment its fairly easy to work around w/ one hadoop command for 
anyone trying to do this w/ the current system.

{noformat}
  hadoop fs -mv <table dir>/*/*.rf <bulk import dir>
{noformat}
                
> Need utility for exporting and importing tables
> -----------------------------------------------
>
>                 Key: ACCUMULO-456
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-456
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
>
>
> Need a utility to to export and import tables.  A use case would be export 
> table on cluster A, distcp to cluter B, import.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to