[
https://issues.apache.org/jira/browse/HIVE-18696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370275#comment-16370275
]
Marta Kuczora commented on HIVE-18696:
--------------------------------------
The idea behind the first patch is
1) Separate the partition validation from starting the tasks which create the
partition folders.
Instead of doing the checks on the partitions and submit the tasks in one loop,
separated the validation into a different loop. So first iterate through the
partitions, validate the table/db names, and check for duplicates. Then if all
partitions were correct, in the second loop submit the tasks to create the
partition folders. This way if one of the partitions is incorrect, the
exception will be thrown in the first loop, before the tasks are submitted. So
we can be sure that no partition folder will be created if the list contains an
invalid partition.
2) Handle the exceptions which occur during the execution of the tasks
differently.
Previously if an exception occured in one task, the remaining tasks were
canceled, and the newly created partition folders were cleaned up in the
finally part. The problem was that it could happen that some tasks were still
not finished with the folder creation when cleaning up the others, so there
could have been leftover folders. After doing some testing it turned out that
this use case cannot be avoided completely when canceling the tasks.
The idea of this patch is to set a flag if an exception is thrown in one of the
tasks. This flag is visible in the tasks and if its value is true, the
partition folders won't be created. Then iterate through the remaining tasks
and wait for them to finish. The tasks which are started before the flag got
set will then finish creating the partition folders. The tasks which are
started after the flag got set, won't create the partition folders, to avoid
unnecessary work. This way it is sure that all tasks are finished, when
entering the finally part where the partition folders are cleaned up.
> The partition folders might not get cleaned up properly in the
> HiveMetaStore.add_partitions_core method if an exception occurs
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-18696
> URL: https://issues.apache.org/jira/browse/HIVE-18696
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Reporter: Marta Kuczora
> Assignee: Marta Kuczora
> Priority: Major
> Attachments: HIVE-18696.1.patch
>
>
> When trying to add multiple partitions, but one of them cannot be created
> successfully, none of the partitions are created, but the folders might not
> be cleaned up properly. See the test case "testAddPartitionsOneInvalid" in
> the TestAddPartitions test.
> This is the problematic code in the HiveMetaStore.add_partitions_core method:
> {code:java}
> for (final Partition part : parts) {
> if (!part.getTableName().equals(tblName) ||
> !part.getDbName().equals(dbName)) {
> throw new MetaException("Partition does not belong to target
> table "
> + dbName + "." + tblName + ": " + part);
> }
> boolean shouldAdd = startAddPartition(ms, part, ifNotExists);
> if (!shouldAdd) {
> existingParts.add(part);
> LOG.info("Not adding partition " + part + " as it already
> exists");
> continue;
> }
> final UserGroupInformation ugi;
> try {
> ugi = UserGroupInformation.getCurrentUser();
> } catch (IOException e) {
> throw new RuntimeException(e);
> }
> partFutures.add(threadPool.submit(new Callable<Partition>() {
> @Override
> public Partition call() throws Exception {
> ugi.doAs(new PrivilegedExceptionAction<Object>() {
> @Override
> public Object run() throws Exception {
> try {
> boolean madeDir = createLocationForAddedPartition(table,
> part);
> if (addedPartitions.put(new PartValEqWrapper(part),
> madeDir) != null) {
> // Technically, for ifNotExists case, we could insert
> one and discard the other
> // because the first one now "exists", but it seems
> better to report the problem
> // upstream as such a command doesn't make sense.
> throw new MetaException("Duplicate partitions in the
> list: " + part);
> }
> initializeAddedPartition(table, part, madeDir);
> } catch (MetaException e) {
> throw new IOException(e.getMessage(), e);
> }
> return null;
> }
> });
> return part;
> }
> }));
> }
> {code}
> When going through the partitions, let's say for the first two partitions the
> threads are successfully submitted to create the folders. But an exception
> occurs for the third partition in the code before submitting the thread. (It
> can happen if the partition has different table or db name as the others or
> it has invalid value.)
> In this case the execution will jump to the finally part where the folders
> in the "addedPartitions" map will be cleaned up. However it can happen that
> the threads for the first two partitions are not finished with the folder
> creation yet, so the map can be empty or it can contain only one of the
> partitions.
> This issue also happens in the HiveMetastore.add_partitions_pspec_core
> method, as this code part is the same as in the add_partitions_core method.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)