GitHub user BJangir opened a pull request:
https://github.com/apache/carbondata/pull/2658
[Carbondata 2885]Broadcast Issue and Small file distribution Issue
Issue :-
1. In External Table Carbon Relation sizeInByte is wrong (always 0)
because of this Join Queries are identified for broadcast even Table actual
size is > 10MB( default broadcast).This is making fail some of the join table (
table which should select sortmergeJoin but because of wrong calculation it
gone for broadcast join)
2. if Merge_small_file task distribution is enabled ,Join queries are
failed (TPCH).
carbon opens many carbon files but it not getting closed.
Root Cause :- 1. Current relation size calculation is based on tablestatus
file but since External Table does not have tablestatus file so always zero was
returned.
2. if Merge_small_file task distribution is enabled carbon opens many
carbon files but it not getting closed.
Solution :-
1. if Table is External Table then calculate size from TablePath .
2. close the carbon files for scan is finished.
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [ ] Any interfaces changed?
NA
- [ ] Any backward compatibility impacted?
NA
- [ ] Document update required?
NA
- [ ] Testing done
Manually testing in 3 node cluster
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
NA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/BJangir/incubator-carbondata CARBONDATA-2885
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2658.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2658
----
commit 69fe7241e0cef5d7b9a6ac9e87018b3d44dd60a0
Author: BJangir <babulaljangir111@...>
Date: 2018-08-24T09:17:49Z
[CARBONDATA-2885] Broadcast Issue and Small file distribution Issue
----
---