EdColeman commented on issue #742: How to move Accumulo table data to HDFS 
which has different structure 
URL: https://github.com/apache/accumulo/issues/742#issuecomment-435732114
 
 
   I believe that you are correct. In the Accumulo example, the use of distcp 
is because that is the most common way of transferring the files between 
clusters.  How ever you get the files to the destination system is really up to 
you.  If you can get the files listed in discp.txt and the exportMetadata.zip 
file into a single directory on the destination system, the import command does 
not care how they got there, and you should be good to go.
   
   The Accumulo bulk import command takes a table name and the hdfs directory 
on the destination system, the only hdfs url needed is the url / path to the 
directory with the files. 
   
   > importtable table_name hdfs_path_to_files
   
   The import uses the exportMetadata.zip file to recreate the table metadata 
and settings (including the splits) and then moves the rfiles in the provided 
directory into / under the Accumulo directory structure (i.e. 
xxx/accumulo/tables/new_table_id/...rf)
   
   The paths in the distcp.txt file are a convenience for using the distcp 
command -f option  (see 
https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html):
   
   > bash$ hadoop distcp -f hdfs://nn1:8020/distcp.txt hdfs://nn2:8020/bar/foo
   
   distcp uses the paths listed in the provided file (distcp.txt in this case) 
and copies them to the foo directory on the destination system.  The results of 
distcp on the destination system are a single directory, with the Accumulo 
rfiles and the exportMetadata.zip file. 
   
   The alternate form of the distcp command takes a file list and the 
destination system directory, so you could do something like:
   
   > disctp hdfs:nn1/accumulo/tables/yy/Axxxx1.rf 
hdfs:nn1/acumulo/tables/yy/Axxxxx2.rf hdfs:nn1/path_to_exportMetadata.zip  
hdfs:nn2/foo/bar  
   
   Passing all of the paths from distcp.txt, including the exportMetadata.zip 
file to distcp, the result would be the same - a single directory on the 
destination system with the Accumulo rfiles and the exportMetadata.zip file, 
its just much easier in this case to give it the files with -f.
   
   You don't need to use distcp, you just need to end up with the same, 
expected directory and all of the files. Hope this helps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to