ivakegg opened a new issue #1963: URL: https://github.com/apache/accumulo/issues/1963
This is similar to #650 but different enough I thought it warranted a separate ticket. The is related to the 1.x versions. Basically the problem is being able to absolutely verify that a bulk imported file was successfully loaded into the system. This requires being able to determine what the file is renamed to during the bulk import process. Given that information we would be able to scan the accumulo.metadata table to find its matching entry. We realize that there is a race condition here in which the GC could have removed it before verification could take place. That situation could be handled by looking in the GC logs which is not very clean but doable. We could of course monitor the master log to determine the file mapping as well but I was hoping for a cleaner solution. One possibility is to actually include the name of the original file in the key or value within the file column family of the accumulo metadata. Another possibility is to have the master pass back the list of file name mappings to the client. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
