GitHub user manishgupta88 opened a pull request:
https://github.com/apache/incubator-carbondata/pull/294
[CARBONDATA-381] Unnecessary catalog metadata refresh and array index of
bound exception in drop table
Problem:
1. Whenever a catalog metadata is refreshed it modified the timestamp of
modifiedTime.mdt file which leads to unnecessary refreshing the complete
catalog metadata.
2. Array Index of bound exception is thrown on failure of table creation.
Analysis:
1. Whenever carbon environment gets initialized it loads the table metadata
in the catalog and changes the timestamp of modifiedTime.mdt file. If a
parallel beeline session is in progress then it will cause unnecessary
refreshing of the catalog metadata.
2. For the very first time if table creation fails then in the exception
block it tries to drop that table and clear its metadata. In drop table filter
API is used which throws array index out of bound exception if metadata array
is empty.
Fix:
1. No need to update the timestamp of modifiedTime.mdt file while loading
metadata. It should only be refreshed on create and drop table operations.
2. Instead of filter API use find API which will return an Option object.
Impact: carbon catalog refresh which will impact query and load flow.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/manishgupta88/incubator-carbondata
table_meta_refresh_issue
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/294.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #294
----
commit 04d62b54a52b563eab41c2f76c02802bd67aedd9
Author: manishgupta88 <[email protected]>
Date: 2016-11-04T08:36:52Z
Problem:
1. Whenever a catalog metadata is refreshed it modified the timestamp of
modifiedTime.mdt file which leads to unnecessary refreshing the complete
catalog metadata.
2. Array Index of bound exception is thrown on failure of table creation.
Analysis:
1. Whenever carbon environment gets initialized it loads the table metadata
in the catalog and changes the timestamp of modifiedTime.mdt file. If a
parallel beeline session is in progress then it will cause unnecessary
refreshing of the catalog metadata.
2. For the very first time if table creation fails then in the exception
block it tries to drop that table and clear its metadata. In drop table filter
API is used which throws array index out of bound exception if metadata array
is empty.
Fix:
1. No need to update the timestamp of modifiedTime.mdt file while loading
metadata. It should only be refreshed on create and drop table operations.
2. Instead of filter API use find API which will return an Option object.
Impact: carbon catalog refresh which will impact query and load flow.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---