date:20130214


[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578523#comment-13578523
 ] 

Michael Malak commented on HIVE-3528:
-

I've tried the latest Avro SerDe from GitHub, and it allows me to write NULLs 
with a simple schema, but not with anything involving STRUCTs.  If a STRUCT 
contains NULLable fields, or the entire STRUCT is NULLable (optional), Hive 
throws exceptions.  Should I create new JIRA item(s)?

 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-14 Thread Sean Busbey (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578534#comment-13578534
]

Sean Busbey commented on HIVE-3528:
---

Hi Michael,

Do you mean [Haivvreo|https://github.com/jghoman/haivvreo]? I don't believe
this fix has been backported. Issues with that project should be filed [against
its issue tracker|https://github.com/jghoman/haivvreo/issues]. I believe that
project is only maintained on a best effort basis now, in favor of the
integrated support within Hive.

Avro SerDe doesn't handle serializing Nullable types that require access to a
Schema

Key: HIVE-3528
URL: https://issues.apache.org/jira/browse/HIVE-3528
Project: Hive
Issue Type: Bug
Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
Labels: avro
Fix For: 0.11.0

Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt

Deserialization properly handles hiding Nullable Avro types, including
complex types like record, map, array, etc. However, when Serialization
attempts to write out these types it erroneously makes use of the UNION
schema that contains NULL and the other type.
This results in Schema mis-match errors for Record, Array, Enum, Fixed, and
Bytes.
Here's a [review board of unit tests that express the
problem|https://reviews.apache.org/r/7431/], as well as one that supports the
case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema


[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578538#comment-13578538
 ] 

Michael Malak commented on HIVE-3528:
-

Sean:

I mean
https://github.com/apache/hive/tree/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro

which has:
AvroSerializer.java 20 days ago HIVE-3528 : Avro SerDe doesn't 
handle serializing Nullable types that


 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-14 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578545#comment-13578545
 ] 

Sean Busbey commented on HIVE-3528:
---

In that case, since 0.11 hasn't gone out yet, either including error 
information here or starting another JIRA is fine.

I'd recommend including the error information here until we can determine if 
your problem is the one that this patch is supposed to fix.

 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4010) Failure finding iterate method with matching signature


[ 
https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578578#comment-13578578
 ] 

Phabricator commented on HIVE-4010:
---

mshang has added reviewers to the revision HIVE-4010 [jira] Failure finding 
iterate method with matching signature.
Added Reviewers: njain

  ping

REVISION DETAIL
  https://reviews.facebook.net/D8517

To: JIRA, jonchang, kevinwilfong, njain, mshang


 Failure finding iterate method with matching signature
 --

 Key: HIVE-4010
 URL: https://issues.apache.org/jira/browse/HIVE-4010
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Miles Shang
Assignee: Miles Shang
Priority: Minor
 Attachments: HIVE-4010.D8517.1.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 http://fburl.com/10467687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive


[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578587#comment-13578587
 ] 

Phabricator commented on HIVE-3874:
---

kevinwilfong has commented on the revision HIVE-3874 [jira] Create a new 
Optimized Row Columnar file format for Hive.

  A couple of minor style comments, according to the style guide 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConvention
 :

  There are a number of places in the code where you are missing spaces around 
+ operators (e.g. line 58 in DynamicByteArray), you're missing a space between 
for and ( (e.g. line 63 in DynamicByteArray), and you're missing a space before 
a : in a for-each loop (e.g. line 191 in OrcStruct).

  Mentioning these now as I don't want them to hold up a commit later.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/orc/OrcInputFormat.java:149-151 Is this 
loop necessary?  result is a boolean array so all of these entries will default 
to false anyway
  ql/src/java/org/apache/hadoop/hive/ql/orc/OutStream.java:136-140 I'm a little 
confused by this, if compressed is null, why aren't you initializing overflow 
as well?
  ql/src/java/org/apache/hadoop/hive/ql/orc/OrcStruct.java:307 I saw issues 
with this, and TypeInfoUtils expecting array instead of list.
  ql/src/java/org/apache/hadoop/hive/ql/orc/WriterImpl.java:561-562 As far as I 
can tell, by storing the intermediate string data in these structures which do 
not write to a stream until writeStripe is called, the size of string columns 
is not being accounted for at all when determining whether or not to write out 
the stripe.  (This could be fixed as a follow up)

REVISION DETAIL
  https://reviews.facebook.net/D8529

To: JIRA, omalley
Cc: kevinwilfong


 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
 OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive

2013-02-14 Thread Jingwei Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingwei Lu updated HIVE-3672:
-

Attachment: HIVE-3672.6.patch.txt

Refreshed on 2/14. 

 Support altering partition column type in Hive
 --

 Key: HIVE-3672
 URL: https://issues.apache.org/jira/browse/HIVE-3672
 Project: Hive
  Issue Type: Improvement
  Components: CLI, SQL
Reporter: Jingwei Lu
Assignee: Jingwei Lu
  Labels: features
 Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
 HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
 HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt

   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, Hive does not allow altering partition column types.  As we've 
 discouraged users from using non-string partition column types, this presents 
 a problem for users who want to change there partition columns to be strings, 
 they have to rename their table, create a new table, and copy all the data 
 over.
 To support this via the CLI, adding a command like ALTER TABLE table_name 
 PARTITION COLUMN (column_name new type);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive

2013-02-14 Thread Jingwei Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingwei Lu updated HIVE-3672:
-

Status: Patch Available  (was: Open)

Refresh and merged on 2/14

 Support altering partition column type in Hive
 --

 Key: HIVE-3672
 URL: https://issues.apache.org/jira/browse/HIVE-3672
 Project: Hive
  Issue Type: Improvement
  Components: CLI, SQL
Reporter: Jingwei Lu
Assignee: Jingwei Lu
  Labels: features
 Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
 HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
 HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt

   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, Hive does not allow altering partition column types.  As we've 
 discouraged users from using non-string partition column types, this presents 
 a problem for users who want to change there partition columns to be strings, 
 they have to rename their table, create a new table, and copy all the data 
 over.
 To support this via the CLI, adding a command like ALTER TABLE table_name 
 PARTITION COLUMN (column_name new type);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4021) PostgreSQL upgrade scripts are creating column with incorrect name

Jarek Jarcec Cecho created HIVE-4021:


 Summary: PostgreSQL upgrade scripts are creating column with 
incorrect name
 Key: HIVE-4021
 URL: https://issues.apache.org/jira/browse/HIVE-4021
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Trivial


I've noticed that PostgreSQL upgrade scripts are creating table 
{{PART_COL_STATS}} and {{TAB_COL_STATS}} with column {{DOUBLE_HIGH_VALUES}}, 
however hive (and all other scripts) are expecting column name 
{{DOUBLE_HIGH_VALUE}} (without the S at the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request: HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect name

2013-02-14 Thread Jarek Cecho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9456/
---

Review request for hive.


Description
---

I've fixed the column name to singular form as the code and other scripts are 
expecting.


This addresses bug HIVE-4021.
https://issues.apache.org/jira/browse/HIVE-4021


Diffs
-

  /trunk/metastore/scripts/upgrade/postgres/012-HIVE-1362.postgres.sql 1446320 

Diff: https://reviews.apache.org/r/9456/diff/


Testing
---


Thanks,

Jarek Cecho

[jira] [Updated] (HIVE-4021) PostgreSQL upgrade scripts are creating column with incorrect name


 [ 
https://issues.apache.org/jira/browse/HIVE-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated HIVE-4021:
-

Attachment: bugHIVE-4021.patch

 PostgreSQL upgrade scripts are creating column with incorrect name
 --

 Key: HIVE-4021
 URL: https://issues.apache.org/jira/browse/HIVE-4021
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Trivial
 Attachments: bugHIVE-4021.patch


 I've noticed that PostgreSQL upgrade scripts are creating table 
 {{PART_COL_STATS}} and {{TAB_COL_STATS}} with column {{DOUBLE_HIGH_VALUES}}, 
 however hive (and all other scripts) are expecting column name 
 {{DOUBLE_HIGH_VALUE}} (without the S at the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4021) PostgreSQL upgrade scripts are creating column with incorrect name


[ 
https://issues.apache.org/jira/browse/HIVE-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578697#comment-13578697
 ] 

Jarek Jarcec Cecho commented on HIVE-4021:
--

Might ask someone with sufficient privilegies to add me into the contributor 
group on Jira? It seems that I still can't assign the ticket to myself.

 PostgreSQL upgrade scripts are creating column with incorrect name
 --

 Key: HIVE-4021
 URL: https://issues.apache.org/jira/browse/HIVE-4021
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Trivial
 Attachments: bugHIVE-4021.patch


 I've noticed that PostgreSQL upgrade scripts are creating table 
 {{PART_COL_STATS}} and {{TAB_COL_STATS}} with column {{DOUBLE_HIGH_VALUES}}, 
 however hive (and all other scripts) are expecting column name 
 {{DOUBLE_HIGH_VALUE}} (without the S at the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4010) Failure finding iterate method with matching signature


 [ 
https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4010:
--

Attachment: HIVE-4010.D8517.2.patch

mshang updated the revision HIVE-4010 [jira] Failure finding iterate method 
with matching signature.

  More informative comments.

Reviewers: JIRA, jonchang, kevinwilfong, njain

REVISION DETAIL
  https://reviews.facebook.net/D8517

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8517?vs=27597id=2#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBridge.java
  ql/src/test/org/apache/hadoop/hive/ql/udf/UDAFTestMethodOverloading.java
  ql/src/test/queries/clientpositive/create_udaf_overload.q
  ql/src/test/results/clientpositive/create_udaf_overload.q.out
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java

To: JIRA, jonchang, kevinwilfong, njain, mshang


 Failure finding iterate method with matching signature
 --

 Key: HIVE-4010
 URL: https://issues.apache.org/jira/browse/HIVE-4010
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Miles Shang
Assignee: Miles Shang
Priority: Minor
 Attachments: HIVE-4010.D8517.1.patch, HIVE-4010.D8517.2.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 http://fburl.com/10467687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4021) PostgreSQL upgrade scripts are creating column with incorrect name

2013-02-14 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578705#comment-13578705
 ] 

Shreepadma Venugopalan commented on HIVE-4021:
--

Looks good, +1.

 PostgreSQL upgrade scripts are creating column with incorrect name
 --

 Key: HIVE-4021
 URL: https://issues.apache.org/jira/browse/HIVE-4021
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Trivial
 Attachments: bugHIVE-4021.patch


 I've noticed that PostgreSQL upgrade scripts are creating table 
 {{PART_COL_STATS}} and {{TAB_COL_STATS}} with column {{DOUBLE_HIGH_VALUES}}, 
 however hive (and all other scripts) are expecting column name 
 {{DOUBLE_HIGH_VALUE}} (without the S at the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4000) Hive client goes into infinite loop at 100% cpu


[ 
https://issues.apache.org/jira/browse/HIVE-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578762#comment-13578762
 ] 

Phabricator commented on HIVE-4000:
---

ashutoshc has accepted the revision HIVE-4000 [jira] Hive client goes into 
infinite loop at 100% cpu.

  +1 Running tests.

REVISION DETAIL
  https://reviews.facebook.net/D8493

BRANCH
  hive-4000

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, omalley
Cc: brock


 Hive client goes into infinite loop at 100% cpu
 ---

 Key: HIVE-4000
 URL: https://issues.apache.org/jira/browse/HIVE-4000
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.10.1

 Attachments: HIVE-4000.D8493.1.patch


 The Hive client starts multiple threads to track the progress of the 
 MapReduce jobs. Unfortunately those threads access several static HashMaps 
 that are not protected by locks. When the HashMaps are modified, they 
 sometimes cause race conditions that lead to the client threads getting stuck 
 in infinite loops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema


[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578783#comment-13578783
 ] 

Michael Malak commented on HIVE-3528:
-

Sean:

OK, I've researched the problem further.

There is in fact a null-struct test case in line 14 of
https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

The test script at
https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

does indeed work when I tested it locally.  But in that test, the query gets 
all of its data from a test table verbatim:

INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

If instead we stick in a hard-coded null for the struct directly into the 
query, it fails:

INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, 
bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, 
bytes1, fixed1 FROM test_serializer;

with the following error:

FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target 
table because column number/types are different 'as_avro': Cannot convert 
column 10 from void to structsint:int,sboolean:boolean,sstring:string.

Note, though, that substituting a hard-coded null for string1 (and restoring 
struct1 to the query) does work:

INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, 
boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, 
fixed1 FROM test_serializer;

I will be entering an all-new JIRA for this.


 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4022) Avro SerDe queries don't handle hard-coded nulls for optional/nullable structs

Michael Malak created HIVE-4022:
---

 Summary: Avro SerDe queries don't handle hard-coded nulls for 
optional/nullable structs
 Key: HIVE-4022
 URL: https://issues.apache.org/jira/browse/HIVE-4022
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Michael Malak


Related to HIVE-3528,

There is in fact a null-struct test case in line 14 of
https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

The test script at
https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

does indeed work.  But in that test, the query gets all of its data from a test 
table verbatim:

INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

If instead we stick in a hard-coded null for the struct directly into the 
query, it fails:

INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, 
bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, 
bytes1, fixed1 FROM test_serializer;

with the following error:

FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target 
table because column number/types are different 'as_avro': Cannot convert 
column 10 from void to structsint:int,sboolean:boolean,sstring:string.

Note, though, that substituting a hard-coded null for string1 (and restoring 
struct1 into the query) does work:

INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, 
boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, 
fixed1 FROM test_serializer;


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby


 [ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2340:


Status: Patch Available  (was: Open)

Passed all tests

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
 HIVE-2340.13.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, 
 HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive


[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578829#comment-13578829
 ] 

Phabricator commented on HIVE-3874:
---

kevinwilfong has commented on the revision HIVE-3874 [jira] Create a new 
Optimized Row Columnar file format for Hive.

  lso

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldReader.java:18 The package 
name doesn't match the directory structure.  This doesn't seem to be causing 
the build to fail, but in Eclipse it shows up as an error.  Could you adjust 
either the package name or the directory structure so they match.

REVISION DETAIL
  https://reviews.facebook.net/D8529

To: JIRA, omalley
Cc: kevinwilfong


 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
 OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive


[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578830#comment-13578830
 ] 

Phabricator commented on HIVE-3874:
---

kevinwilfong has commented on the revision HIVE-3874 [jira] Create a new 
Optimized Row Columnar file format for Hive.

  *Ignore the lso

REVISION DETAIL
  https://reviews.facebook.net/D8529

To: JIRA, omalley
Cc: kevinwilfong


 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
 OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4023) Improve Error Logging in MetaStore

Bhushan Mandhani created HIVE-4023:
--

 Summary: Improve Error Logging in MetaStore
 Key: HIVE-4023
 URL: https://issues.apache.org/jira/browse/HIVE-4023
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Trivial


The RetryingHMSHandler should log the entire stack trace before throwing an 
exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2670) A cluster test utility for Hive

2013-02-14 Thread Johnny Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang updated HIVE-2670:
---

Status: Patch Available  (was: Open)

[~alangates], please let me know what do you think about the latest patch. 
thanks.

 A cluster test utility for Hive
 ---

 Key: HIVE-2670
 URL: https://issues.apache.org/jira/browse/HIVE-2670
 Project: Hive
  Issue Type: New Feature
  Components: Testing Infrastructure
Reporter: Alan Gates
Assignee: Johnny Zhang
 Attachments: harness.tar, hive_cluster_test_2.patch, 
 hive_cluster_test_3.patch, hive_cluster_test_4.patch, hive_cluster_test.patch


 Hive has an extensive set of unit tests, but it does not have an 
 infrastructure for testing in a cluster environment.  Pig and HCatalog have 
 been using a test harness for cluster testing for some time.  We have written 
 Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4023) Improve Error Logging in MetaStore


 [ 
https://issues.apache.org/jira/browse/HIVE-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhushan Mandhani updated HIVE-4023:
---

Attachment: HIVE-4023.1.patch.txt

 Improve Error Logging in MetaStore
 --

 Key: HIVE-4023
 URL: https://issues.apache.org/jira/browse/HIVE-4023
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Trivial
 Attachments: HIVE-4023.1.patch.txt


 The RetryingHMSHandler should log the entire stack trace before throwing an 
 exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4023) Improve Error Logging in MetaStore


 [ 
https://issues.apache.org/jira/browse/HIVE-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhushan Mandhani updated HIVE-4023:
---

Status: Patch Available  (was: Open)

 Improve Error Logging in MetaStore
 --

 Key: HIVE-4023
 URL: https://issues.apache.org/jira/browse/HIVE-4023
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Trivial
 Attachments: HIVE-4023.1.patch.txt


 The RetryingHMSHandler should log the entire stack trace before throwing an 
 exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-3831) Add Command to Turn Sorting Off for a Bucketed Table


 [ 
https://issues.apache.org/jira/browse/HIVE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhushan Mandhani resolved HIVE-3831.


Resolution: Duplicate

 Add Command to Turn Sorting Off for a Bucketed Table
 

 Key: HIVE-3831
 URL: https://issues.apache.org/jira/browse/HIVE-3831
 Project: Hive
  Issue Type: Bug
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Minor

 If we have specified a bucketed table as sorted on some columns, there is no 
 Hive command to turn the sorting off for that table. There are scenarios 
 where we need to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-3716) Create Table Like should support TableProperties


 [ 
https://issues.apache.org/jira/browse/HIVE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhushan Mandhani resolved HIVE-3716.


Resolution: Duplicate

Fixed by Kevin Wilfong in another jira.

 Create Table Like should support TableProperties
 

 Key: HIVE-3716
 URL: https://issues.apache.org/jira/browse/HIVE-3716
 Project: Hive
  Issue Type: New Feature
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Trivial

 Create Table Like currently doesn't allow the specification of 
 TableProperties for the created table. It will be useful to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive-trunk-hadoop2 - Build # 122 - Still Failing

2013-02-14 Thread Apache Jenkins Server

Changes for Build #84
[cws] HIVE-3931. Add Oracle metastore upgrade script for 0.9 to 10.0
 (Prasad Mujumdar via cws)


Changes for Build #85

Changes for Build #86
[hashutosh] HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin 
via Ashutosh Chauhan)

[hashutosh] HIVE-3833 : object inspectors should be initialized based on 
partition metadata (Namit Jain via Ashutosh Chauhan)


Changes for Build #87

Changes for Build #88
[namit] HIVE-3825 Add Operator level Hooks
(Pamela Vagata via namit)

[hashutosh] HIVE-3528 : Avro SerDe doesn't handle serializing Nullable types 
that require access to a Schema (Sean Busbey via Ashutosh Chauhan)

[namit] HIVE-3943 Skewed query fails if hdfs path has special characters
(Gang Tim Liu via namit)


Changes for Build #89
[namit] HIVE-3527 Allow CREATE TABLE LIKE command to take TBLPROPERTIES
(Kevin Wilfong via namit)

[namit] HIVE-3944 Make accept qfile argument for miniMR tests
(Navis via namit)


Changes for Build #90
[namit] HIVE-3912 table_access_keys_stats.q fails with hadoop 0.23
(Sushanth Sownyan via namit)

[namit] HIVE-3921 recursive_dir.q fails on 0.23
(Sushanth Sowmyan via namit)

[namit] HIVE-3923 join_filters_overlap.q fails on 0.23
(Sushanth Sowmyan via namit)

[namit] HIVE-3924 join_nullsafe.q fails on 0.23
(Sushanth Sownyan via namit)

[hashutosh] Adding csv.txt file, left out from commit of 3528


Changes for Build #91

Changes for Build #92
[hashutosh] HIVE-3799 : Better error message if metalisteners or hookContext 
cannot be loaded/instantiated (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-3947 : MiniMR test remains pending after test completion 
(Navis via Ashutosh Chauhan)


Changes for Build #93

Changes for Build #94
[kevinwilfong] HIVE-3903. Allow updating bucketing/sorting metadata of a 
partition through the CLI. (Samuel Yuan via kevinwilfong)


Changes for Build #95
[namit] HIVE-3873 lot of tests failing for hadoop 23
(Gang Tim Liu via namit)


Changes for Build #96
[hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784

[hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan)


Changes for Build #97
[namit] HIVE-933 Infer bucketing/sorting properties
(Kevin Wilfong via namit)

[hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh 
Chauhan, Reviewed by Namit Jain)


Changes for Build #98

Changes for Build #99
[kevinwilfong] HIVE-3940. Track columns accessed in each table in a query. 
(Samuel Yuan via kevinwilfong)


Changes for Build #100
[namit] HIVE-3778 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
(Gang Tim Liu via namit)


Changes for Build #101

Changes for Build #102

Changes for Build #103

Changes for Build #104
[hashutosh] HIVE-3977 : Hive 0.10 postgres schema script is broken (Johnny 
Zhang via Ashutosh Chauhan)

[hashutosh] HIVE-3932 : Hive release tarballs don't contain PostgreSQL 
metastore scripts (Mark Grover via Ashutosh Chauhan)


Changes for Build #105
[hashutosh] HIVE-3918 : Normalize more CRLF line endings (Mark Grover via 
Ashutosh Chauhan)

[namit] HIVE-3917 Support noscan operation for analyze command
(Gang Tim Liu via namit)


Changes for Build #106
[namit] HIVE-3937 Hive Profiler
(Pamela Vagata via namit)

[hashutosh] HIVE-3571 : add a way to run a small unit quickly (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-3956 : TestMetaStoreAuthorization always uses the same port 
(Navis via Ashutosh Chauhan)


Changes for Build #107

Changes for Build #108

Changes for Build #109

Changes for Build #110
[namit] HIVE-2839 Filters on outer join with mapjoin hint is not applied 
correctly
(Navis via namit)


Changes for Build #111

Changes for Build #112
[namit] HIVE-3998 Oracle metastore update script will fail when upgrading from 
0.9.0 to
0.10.0 (Jarek and Mark via namit)

[namit] HIVE-3999 Mysql metastore upgrade script will end up with different 
schema than
the full schema load (Jarek and Mark via namit)


Changes for Build #113

Changes for Build #114
[namit] HIVE-3995 PostgreSQL upgrade scripts are not valid
(Jarek and Mark via namit)


Changes for Build #115

Changes for Build #116
[namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility
(Navis via namit)


Changes for Build #117

Changes for Build #118

Changes for Build #119

Changes for Build #120
[kevinwilfong] HIVE-3252. Add environment context to metastore Thrift calls. 
(Samuel Yuan via kevinwilfong)


Changes for Build #121

Changes for Build #122



32 tests failed.
REGRESSION:  
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap1

Error Message:
Unexpected exception See build/ql/tmp/hive.log, or try ant test ... 
-Dtest.silent=false to get more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get 
more logs.
at junit.framework.Assert.fail(Assert.java:50)
at

[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-14 Thread Kevin Wilfong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3874:


Status: Open  (was: Patch Available)

 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
 OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-948) more query plan optimization rules


[ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578920#comment-13578920
 ] 

Phabricator commented on HIVE-948:
--

navis has commented on the revision HIVE-948 [jira] more query plan 
optimization rules.

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java:75 
I don't know well about the issue, but there is still a rule(R7:MAPJOIN%) about 
mapjoin in genMapRedTasks() method in SemanticAnalyzer. what's that?

  I'll remove that.
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CleanupProcessor.java:47 ok, 
NonBlockingOpDeDupProc will be good. When we need some other cleaning ups, we 
can rename it or do other things.

REVISION DETAIL
  https://reviews.facebook.net/D8463

BRANCH
  DPAL-1980

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, navis


 more query plan optimization rules 
 ---

 Key: HIVE-948
 URL: https://issues.apache.org/jira/browse/HIVE-948
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Navis
 Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch


 Many query plans are not optimal in that they contain redundant operators. 
 Some examples are unnecessary select operators (select followed by select, 
 select output being the same as input etc.). Even though these operators are 
 not very expensive, they could account for around 10% of CPU time in some 
 simple queries. It seems they are low-hanging fruits that we should pick 
 first. 
 BTW, it seems these optimization rules should be added at the last stage of 
 the physical optimization phase since some redundant operators are added to 
 facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-948) more query plan optimization rules


 [ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-948:
---

Attachment: HIVE-948.D8463.3.patch

It contains test changes and too big for the Phabricator. 

 more query plan optimization rules 
 ---

 Key: HIVE-948
 URL: https://issues.apache.org/jira/browse/HIVE-948
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Navis
 Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, 
 HIVE-948.D8463.3.patch


 Many query plans are not optimal in that they contain redundant operators. 
 Some examples are unnecessary select operators (select followed by select, 
 select output being the same as input etc.). Even though these operators are 
 not very expensive, they could account for around 10% of CPU time in some 
 simple queries. It seems they are low-hanging fruits that we should pick 
 first. 
 BTW, it seems these optimization rules should be added at the last stage of 
 the physical optimization phase since some redundant operators are added to 
 facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4000) Hive client goes into infinite loop at 100% cpu

2013-02-14 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578922#comment-13578922
 ] 

Ashutosh Chauhan commented on HIVE-4000:


Many tests are resulting in NPE because we switched to ConcurrentHashMaps which 
don't allow null keys (as oppose to HashMaps). Stacktrace:
{noformat}
] java.lang.NullPointerException
[junit] at 
java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:782)
[junit] at 
java.util.Collections$SetFromMap.contains(Collections.java:3574)
[junit] at 
org.apache.hadoop.hive.ql.QueryPlan.extractCounters(QueryPlan.java:364)
[junit] at 
org.apache.hadoop.hive.ql.QueryPlan.getQueryPlan(QueryPlan.java:444)
[junit] at 
org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:617)
[junit] at 
org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:503)
[junit] at 
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:147)
[junit] at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
[junit] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343)
[junit] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1129)
[junit] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:940)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
[junit] at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774)
[junit] at 
org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:5759)
[junit] at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_drop(TestCliDriver.java:1923)
{noformat}

Seems like we can fix this by modifying QueryPlan.java:364 to:
{code}
 if (task.getId() != null  started.contains(task.getId())  
done.contains(task.getId())) {
continue;
  }
{code}

 Hive client goes into infinite loop at 100% cpu
 ---

 Key: HIVE-4000
 URL: https://issues.apache.org/jira/browse/HIVE-4000
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.10.1

 Attachments: HIVE-4000.D8493.1.patch


 The Hive client starts multiple threads to track the progress of the 
 MapReduce jobs. Unfortunately those threads access several static HashMaps 
 that are not protected by locks. When the HashMaps are modified, they 
 sometimes cause race conditions that lead to the client threads getting stuck 
 in infinite loops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-948) more query plan optimization rules


 [ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-948:
-

Attachment: HIVE-948.D8463.3.patch

navis updated the revision HIVE-948 [jira] more query plan optimization rules.

  Addressed comments

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D8463

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8463?vs=27603id=27807#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java

To: JIRA, ashutoshc, navis


 more query plan optimization rules 
 ---

 Key: HIVE-948
 URL: https://issues.apache.org/jira/browse/HIVE-948
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Navis
 Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, 
 HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch


 Many query plans are not optimal in that they contain redundant operators. 
 Some examples are unnecessary select operators (select followed by select, 
 select output being the same as input etc.). Even though these operators are 
 not very expensive, they could account for around 10% of CPU time in some 
 simple queries. It seems they are low-hanging fruits that we should pick 
 first. 
 BTW, it seems these optimization rules should be added at the last stage of 
 the physical optimization phase since some redundant operators are added to 
 facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4000) Hive client goes into infinite loop at 100% cpu

2013-02-14 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4000:
---

Status: Open  (was: Patch Available)

Also, Owen can you add a sample query and scenario where this problem will show 
up.

 Hive client goes into infinite loop at 100% cpu
 ---

 Key: HIVE-4000
 URL: https://issues.apache.org/jira/browse/HIVE-4000
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.10.1

 Attachments: HIVE-4000.D8493.1.patch


 The Hive client starts multiple threads to track the progress of the 
 MapReduce jobs. Unfortunately those threads access several static HashMaps 
 that are not protected by locks. When the HashMaps are modified, they 
 sometimes cause race conditions that lead to the client threads getting stuck 
 in infinite loops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4024) Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0

Jarek Jarcec Cecho created HIVE-4024:


 Summary: Derby metastore update script will fail when upgrading 
from 0.9.0 to 0.10.0
 Key: HIVE-4024
 URL: https://issues.apache.org/jira/browse/HIVE-4024
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Minor


The problem is in following file {{011-HIVE-3649.derby.sql}} that contains 
following line:

{code}
ALTER TABLE SDS ADD  IS_STOREDASSUBDIRECTORIES CHAR(1) NOT NULL;
{code}

This query will however fail if the table SDS have at least one row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request: HIVE-4024 Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0

2013-02-14 Thread Jarek Cecho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9462/
---

Review request for hive.


Description
---

I've provided similar solution as in HIVE-3995, HIVE-3998 and HIVE-3999.


This addresses bug HIVE-4024.
https://issues.apache.org/jira/browse/HIVE-4024


Diffs
-

  /trunk/metastore/scripts/upgrade/derby/011-HIVE-3649.derby.sql 1443292 

Diff: https://reviews.apache.org/r/9462/diff/


Testing
---

I've tested the upgrade procedure.


Thanks,

Jarek Cecho

[jira] [Updated] (HIVE-4024) Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0


 [ 
https://issues.apache.org/jira/browse/HIVE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated HIVE-4024:
-

Attachment: bugHIVE-4024.patch

 Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0
 ---

 Key: HIVE-4024
 URL: https://issues.apache.org/jira/browse/HIVE-4024
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Minor
 Attachments: bugHIVE-4024.patch


 The problem is in following file {{011-HIVE-3649.derby.sql}} that contains 
 following line:
 {code}
 ALTER TABLE SDS ADD  IS_STOREDASSUBDIRECTORIES CHAR(1) NOT NULL;
 {code}
 This query will however fail if the table SDS have at least one row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4024) Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0


 [ 
https://issues.apache.org/jira/browse/HIVE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated HIVE-4024:
-

Status: Patch Available  (was: Open)

 Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0
 ---

 Key: HIVE-4024
 URL: https://issues.apache.org/jira/browse/HIVE-4024
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Minor
 Attachments: bugHIVE-4024.patch


 The problem is in following file {{011-HIVE-3649.derby.sql}} that contains 
 following line:
 {code}
 ALTER TABLE SDS ADD  IS_STOREDASSUBDIRECTORIES CHAR(1) NOT NULL;
 {code}
 This query will however fail if the table SDS have at least one row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-201) fetch task appears as a root task in explain plan


 [ 
https://issues.apache.org/jira/browse/HIVE-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis resolved HIVE-201.


Resolution: Duplicate

 fetch task appears as a root task in explain plan
 -

 Key: HIVE-201
 URL: https://issues.apache.org/jira/browse/HIVE-201
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain

 fetch task appears as a root task in explain plan. that should be changed so 
 that fetch task depends on the execution task appropriately

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HIVE-201) fetch task appears as a root task in explain plan


 [ 
https://issues.apache.org/jira/browse/HIVE-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reopened HIVE-201:



Sorry, mistakenly pushed resolve.

 fetch task appears as a root task in explain plan
 -

 Key: HIVE-201
 URL: https://issues.apache.org/jira/browse/HIVE-201
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain

 fetch task appears as a root task in explain plan. that should be changed so 
 that fetch task depends on the execution task appropriately

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-1035) limit can be optimized if the limit is happening on the reducer