[
https://issues.apache.org/jira/browse/HIVE-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296111#comment-15296111
]
Gopal V edited comment on HIVE-13818 at 5/23/16 8:50 AM:
---------------------------------------------------------
Update theory - the issue disappeared when I did {{cast as bigint}} to every
join column, so that they wouldn't be the 4 byte ints (4+1 byte=5).
{code}
select i_item_id,
s_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
from store_sales, customer_demographics, date_dim, store, item
where cast(store_sales.ss_sold_date_sk as bigint) = cast(date_dim.d_date_sk as
bigint) and
cast(store_sales.ss_item_sk as bigint) = cast(item.i_item_sk as bigint)
and
cast(store_sales.ss_store_sk as bigint) = cast(store.s_store_sk as
bigint) and
cast(store_sales.ss_cdemo_sk as bigint) =
cast(customer_demographics.cd_demo_sk as bigint) and
customer_demographics.cd_gender = 'F' and
customer_demographics.cd_marital_status = 'D' and
customer_demographics.cd_education_status = 'Unknown' and
date_dim.d_year = 1998 and
store.s_state in ('KS','AL', 'MN', 'AL', 'SC', 'VT')
group by i_item_id, s_state
order by i_item_id
,s_state
limit 10;
{code}
The BinarySortableDeserializeRead.java:213 is actually the Long parsing, which
might be accidentally trying to deserialize an Int key using the Long codepath.
{code}
208 case LONG:
209 {
210 final boolean invert = columnSortOrderIsDesc[fieldIndex];
211 long v = inputByteBuffer.read(invert) ^ 0x80;
212 for (int i = 0; i < 7; i++) {
213 v = (v << 8) + (inputByteBuffer.read(invert) & 0xff);
214 }
215 currentLong = v;
216 }
217 break;
{code}
The sort order issues with var-int encoding might be the reason int & long are
encoded in different byte widths inside BinarySortable.
was (Author: gopalv):
Update theory - the issue disappeared when I did {{cast as bigint}} to every
join column, so that they wouldn't be the 4 byte ints (4+1 byte=5).
{code}
select i_item_id,
s_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
from store_sales, customer_demographics, date_dim, store, item
where cast(store_sales.ss_sold_date_sk as bigint) = cast(date_dim.d_date_sk as
bigint) and
cast(store_sales.ss_item_sk as bigint) = cast(item.i_item_sk as bigint)
and
cast(store_sales.ss_store_sk as bigint) = cast(store.s_store_sk as
bigint) and
cast(store_sales.ss_cdemo_sk as bigint) =
cast(customer_demographics.cd_demo_sk as bigint) and
customer_demographics.cd_gender = 'F' and
customer_demographics.cd_marital_status = 'D' and
customer_demographics.cd_education_status = 'Unknown' and
date_dim.d_year = 1998 and
store.s_state in ('KS','AL', 'MN', 'AL', 'SC', 'VT')
group by i_item_id, s_state
order by i_item_id
,s_state
limit 10;
{code}
The BinarySortableDeserializeRead.java:213 is actually the Long parsing, which
might be accidentally trying to deserialize an Int key using the Long codepath.
{code}
208 case LONG:
209 {
210 final boolean invert = columnSortOrderIsDesc[fieldIndex];
211 long v = inputByteBuffer.read(invert) ^ 0x80;
212 for (int i = 0; i < 7; i++) {
213 v = (v << 8) + (inputByteBuffer.read(invert) & 0xff);
214 }
215 currentLong = v;
216 }
217 break;
{code}
> (Part 2) EOFException with fast hashtable
> -----------------------------------------
>
> Key: HIVE-13818
> URL: https://issues.apache.org/jira/browse/HIVE-13818
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: HIVE-13818.01.patch
>
>
> Changes for HIVE-13682 did fix a bug in Fast Hash Tables, but evidently not
> this issue according to Gopal/Rajesh/Nita.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)