[ 
https://issues.apache.org/jira/browse/IMPALA-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836804#comment-16836804
 ] 

Vihang Karajgaonkar commented on IMPALA-8479:
---------------------------------------------

The snippet provided above reproduces the issue because of the way metastore 
provides transaction isolation. By default metastore provides read-committed 
transaction isolation (unless user has overridden 
{{datanucleus.transactionIsolation}} configuration on the metastore to 
{{repeatable-read}} level which is highly unlikely)

When Impala refreshes a table it requests partitions in 2 steps. First it 
fetches all the partition names and then it requests all the partitions for the 
fetched names. Both these steps happen in separate transactions. This should be 
fine since we don't verify that number of fetched names is equals to number of 
partitions received. The problem occurs when for example another transaction is 
dropping some partitions of the same table at the same time. Since fetching 
partitions in directSQL path is done by issuing many select statements it is 
possible that the another transaction removed some of the partitions in between 
the two select statements which are fired by the {{get_partitions_by_names}} 
API. In such case, a partially filled partition can be returned which in this 
case triggers the error. For example following sequence of events can cause 
this problem to occur.

Session 1 issues {{get_partitions_by_names}} API. This internally issues many 
select statements so something like

S1: {{get_partitions_by_names}} 

S1:    --> open_transaction

S1:    --> select all partition ids for the given table and names

S2:  {{drop_partition_by_name}}

S2: {{open transaction}}

S1: --> select on params table for fetching parameters etc

S2: drops a partition by name

S2: {{commit}}

S1:   --> select on partition values table for fetching partition_values --> 
this will not return values for the partitions which are dropped by S2. All 
such partitions will be incomplete and cause this problem on Impala side.

S1: {{commit}}

S1: above returns some partially filled partitions.

In my opinion this is a problem on metastore side. The solution is either to 
fetch all the partition information in one select statement joining all the 
relevant tables or handle such cases in metastore code after the results are 
fetched (at-least in case of drops make sure such partitions are not returned). 
The second approach will still have the same problem when a partition is 
altered from another session.

On the other hand, it could also be argued that Impala needs to play well other 
HMS clients (like Hive) by taking shared locks when reading table metadata so 
that other clients are blocked from dropping them while Impala is reading it. 
Once we start supporting transactional tables in Impala, this problem will be 
hopefully resolved on its own.

The issue should be rare in practice unless there are concurrent workloads 
running in Hive and Impala at the same time on the same set of tables. In such 
cases, it is recommended that metastore should use {{repeatable-read}} 
transaction isolation level to avoid this problem. This can be done by setting 
the following configuration in the hive-site.xml of metastore

{code}

<property>
 <name>datanucleus.transactionIsolation</name>
 <value>repeatable-read</value>
 </property>

{code}

> REFRESH may fail if table metadata mutates during load
> ------------------------------------------------------
>
>                 Key: IMPALA-8479
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8479
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 3.3.0
>            Reporter: Vincent Tran
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>
> Reproduction:
>  1) Create a partitioned table
> {noformat}
> create table t1 (c1 int) partitioned by (part int);{noformat}
> 2) Generate a decent number of partitions
> {noformat}
> for i in {1..5000}; do impala-shell.sh -q "insert into t1 partition(part=$i) 
> values ($i)"; done
> {noformat}
> 3) Start an IM;REFRESH; loop
> {noformat}
> while :; do impala-shell.sh -q "invalidate metadata; refresh t1;"; 
> done{noformat}
>  4) Start dropping partitions in Hive.
> {noformat}
> for i in {1..5000}; do hive -e "alter table t1 drop partition (part=$i)"; 
> done{noformat}
> Eventually, when the "REFRESH" and "ALTER TABLE ... DROP ..." coincides in 
> HMS, catalogd will encounter this TableLoadingException (as appeared on trunk)
> {noformat}
> I0501 14:06:14.552676 38522 TableLoadingMgr.java:70] Loading metadata for 
> table: vt.t1
> I0501 14:06:14.552776 38927 TableLoader.java:61] Loading metadata for: vt.t1
> I0501 14:06:14.552850 38522 TableLoadingMgr.java:72] Remaining items in 
> queue: 0. Loads in progress: 1
> I0501 14:06:14.566756 38927 HdfsTable.java:941] Fetching partition metadata 
> from the Metastore: vt.t1
> I0501 14:06:16.305446 38927 HdfsTable.java:945] Fetched partition metadata 
> from the Metastore: vt.t1
> I0501 14:06:16.367607 38927 TableLoader.java:101] Loaded metadata for: vt.t1 
> (1814ms)
> I0501 14:06:16.368847 38522 jni-util.cc:256] 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: vt.t1
>         at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:956)
>         at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:877)
>         at org.apache.impala.catalog.TableLoader.load(TableLoader.java:84)
>         at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:241)
>         at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:238)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalArgumentException: Cannot parse partition values 
> '[]' for table vt.t1: expected %d values but got %d [1, 0]
>         at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:119)
>         at 
> org.apache.impala.catalog.FeCatalogUtils.parsePartitionKeyValues(FeCatalogUtils.java:224)
>         at 
> org.apache.impala.catalog.HdfsTable.createPartition(HdfsTable.java:698)
>         at 
> org.apache.impala.catalog.HdfsTable.loadAllPartitions(HdfsTable.java:532)
>         at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:946)
>         ... 8 more
> I0501 14:06:16.440848 38522 status.cc:124] TableLoadingException: Failed to 
> load metadata for table: vt.t1
> CAUSED BY: IllegalArgumentException: Cannot parse partition values '[]' for 
> table vt.t1: expected %d values but got %d [1, 0]
>     @          0x1a91ff0  impala::Status::Status()
>     @          0x221518c  impala::JniUtil::GetJniExceptionMsg()
>     @          0x1a7a369  impala::JniCall::Call<>()
>     @          0x1a78475  impala::JniUtil::CallJniMethod<>()
>     @          0x1a7663c  impala::Catalog::ResetMetadata()
>     @          0x1a4e5df  CatalogServiceThriftIf::ResetMetadata()
>     @          0x1aea49d  
> impala::CatalogServiceProcessor::process_ResetMetadata()
>     @          0x1ae8b9b  impala::CatalogServiceProcessor::dispatchCall()
>     @          0x1a36feb  apache::thrift::TDispatchProcessor::process()
>     @          0x1e8e8a0  
> apache::thrift::server::TAcceptQueueServer::Task::run()
>     @          0x1e8509e  impala::ThriftThread::RunRunnable()
>     @          0x1e867c4  boost::_mfi::mf2<>::operator()()
>     @          0x1e8665a  boost::_bi::list3<>::operator()<>()
>     @          0x1e863a6  boost::_bi::bind_t<>::operator()()
>     @          0x1e862b9  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
>     @          0x1daa4eb  boost::function0<>::operator()()
>     @          0x2286cd0  impala::Thread::SuperviseThread()
>     @          0x228f054  boost::_bi::list5<>::operator()<>()
>     @          0x228ef78  boost::_bi::bind_t<>::operator()()
>     @          0x228ef3b  boost::detail::thread_data<>::run()
>     @          0x37967b9  thread_proxy
>     @     0x7f1dc6fef6b9  start_thread
>     @     0x7f1dc6d2541c  clone
> E0501 14:06:16.440897 38522 catalog-server.cc:122] TableLoadingException: 
> Failed to load metadata for table: vt.t1
> CAUSED BY: IllegalArgumentException: Cannot parse partition values '[]' for 
> table vt.t1: expected %d values but got %d [1, 0]
> {noformat}
> In earlier versions, this may manifest as a similar in context but distinct 
> message.
> In Impala 2.10, for example:
> {noformat}
> I0501 09:51:20.349366 21296 CatalogServiceCatalog.java:996] Refreshing table 
> metadata: default.pt1
> I0501 09:51:20.359350 21296 HdfsTable.java:1067] Incrementally loading table 
> metadata for: default.pt1
> I0501 09:51:20.393787 21296 jni-util.cc:211] 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: pt1
>         at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1092)
>         at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1020)
>         at 
> org.apache.impala.catalog.CatalogServiceCatalog.reloadTable(CatalogServiceCatalog.java:1016)
>         at 
> org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:3067)
>         at 
> org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:156)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>         at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>         at java.util.ArrayList.get(ArrayList.java:429)
>         at 
> org.apache.hadoop.hive.common.FileUtils.makePartName(FileUtils.java:155)
>         at 
> org.apache.hadoop.hive.common.FileUtils.makePartName(FileUtils.java:135)
>         at 
> org.apache.impala.catalog.HdfsPartition.getPartitionName(HdfsPartition.java:491)
>         at 
> org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(HdfsTable.java:1171)
>         at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1073)
>         ... 4 more
> I0501 09:51:20.402027 21296 status.cc:122] TableLoadingException: Failed to 
> load metadata for table: pt1
> CAUSED BY: IndexOutOfBoundsException: Index: 0, Size: 0
>     @           0x83efc9  impala::Status::Status()
>     @           0xb747d2  impala::JniUtil::GetJniExceptionMsg()
>     @           0x8317cb  impala::Catalog::ResetMetadata()
>     @           0x8244db  CatalogServiceThriftIf::ResetMetadata()
>     @           0x8cb329  
> impala::CatalogServiceProcessor::process_ResetMetadata()
>     @           0x8c8f39  impala::CatalogServiceProcessor::dispatchCall()
>     @           0x80ecec  apache::thrift::TDispatchProcessor::process()
>     @           0x9db65f  
> apache::thrift::server::TAcceptQueueServer::Task::run()
>     @           0x9d5c79  impala::ThriftThread::RunRunnable()
>     @           0x9d6a52  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
>     @           0xbd6ff2  impala::Thread::SuperviseThread()
>     @           0xbd7754  boost::detail::thread_data<>::run()
>     @           0xe6418a  (unknown)
>     @     0x7f0faf4a4dd5  start_thread
>     @     0x7f0faf1cdead  __clone
> E0501 09:51:20.402400 21296 catalog-server.cc:82] TableLoadingException: 
> Failed to load metadata for table: pt1
> CAUSED BY: IndexOutOfBoundsException: Index: 0, Size: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to