[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-02-14 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.27.patch

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: auto_sortmerge_join_1_modified.q, hive.3403.10.patch, 
 hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, 
 hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, 
 hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, 
 hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, 
 hive.3403.23.patch, hive.3403.24.patch, hive.3403.25.patch, 
 hive.3403.26.patch, hive.3403.27.patch, hive.3403.2.patch, hive.3403.3.patch, 
 hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, 
 hive.3403.8.patch, hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2361) Add some UDFs which help to migrate Oracle to Hive

2013-02-14 Thread neelesh gadhia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578222#comment-13578222
 ] 

neelesh gadhia commented on HIVE-2361:
--

can you post the url for discussion/forum for  hive-user mailing list?. or is 
it just the email address I need to send the details about the issue?

  Add some UDFs which help to migrate Oracle to Hive
 ---

 Key: HIVE-2361
 URL: https://issues.apache.org/jira/browse/HIVE-2361
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.8.0
Reporter: JunHo Cho
Assignee: JunHo Cho
Priority: Minor
  Labels: features
 Attachments: nexr-udf.tar


 Here some UDFs which can be matched to oracle functions:
 There are two kinds of oracle functions. one is scalar function and another 
 is analytic function.
 Most scalar functions in Oracle can be converted to hive's udf directly.  
 Oracle Scalar Function
 GenericUDFDecode : Compares first argument to each other value one by one. 
 e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
 GenericUDFGreatest : Return the greatest of the list of one or more 
 expressions. e.g., GREATEST(2,5,12,3) will return 12
 GenericUDFInstr : Return the location of a substring in a string. e.g., 
 INSTR('next', 'e') will return 2
 GenericUDFLnnvl : Evaluate a condition when one or both operands of the 
 condition may be null. e.g., LNNVL(2  4) will return true
 GenericUDFNVL : Replace null with a string in the results of a query. e.g., 
 NVL(null,'hive') will return hive
 GenericUDFNVL2 : Determine the value returned by a query based on whether a 
 specified expression is null or not null. e.g., NVL2(null,'not null','null 
 value') will return 'null value'
 GenericUDFToNumber : Convert a string to a number. e.g., 
 TO_NUMBER('112','999') will return 112
 GenericUDFTrunc : Returns a date truncated to a specific unit of measure. 
 e.g., TRUNC('2002-11-02 01:01:01','') will return '2002-01-01 00:00:00'
 Oracle Analytic Function
 Most analytic functions in Oracle can't be converted to hive's query and udf 
 directly.
 Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to 
 support analytic functions 
 e.q., SELECT _FUNC_(hash(col1), col2, ...) FROM SELECT ~ FROM table 
 DISTRIBUTED BY hash(col1) SORT BY col1, col2 ...
 GenericUDFSum : Calculate a cumulative sum.
 GenericUDFRank : Assign a sequential order, or rank within some group based 
 on key.
 GenericUDFDenseRank : Act like RANK function except that it assigns 
 consecutive ranks.
 GenericUDFRowNumber : Return sequence integer value within some group based 
 on key.
 GenericUDFMax : Determine the highest value within some group based on key.
 GenericUDFMin : Determine the lowest value within some group based on key.
 GenericUDFLag : Access data from a previous row.
 This udfs was developed with hive-pdk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [jira] [Commented] (HIVE-2361) Add some UDFs which help to migrate Oracle to Hive

2013-02-14 Thread Lefty Leverenz
Here you go:

http://hive.apache.org/mailing_lists.html#Developers
Users

If you use Hive, please subscribe to the Hive user mailing list.

The Hive user mailing list is: u...@hive.apache.org.

   - Subscribe to List user-subscr...@hive.apache.org
   - Unsubscribe from List user-unsubscr...@hive.apache.org
   - Archives http://mail-archives.apache.org/mod_mbox/hive-user/
   - Archives from when Hive was still a Hadoop
sub-projecthttp://mail-archives.apache.org/mod_mbox/hadoop-hive-user/


– Lefty



On Thu, Feb 14, 2013 at 12:22 AM, neelesh gadhia (JIRA) j...@apache.orgwrote:


 [
 https://issues.apache.org/jira/browse/HIVE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578222#comment-13578222]

 neelesh gadhia commented on HIVE-2361:
 --

 can you post the url for discussion/forum for  hive-user mailing list?. or
 is it just the email address I need to send the details about the issue?

   Add some UDFs which help to migrate Oracle to Hive
  ---
 
  Key: HIVE-2361
  URL: https://issues.apache.org/jira/browse/HIVE-2361
  Project: Hive
   Issue Type: New Feature
   Components: UDF
 Affects Versions: 0.8.0
 Reporter: JunHo Cho
 Assignee: JunHo Cho
 Priority: Minor
   Labels: features
  Attachments: nexr-udf.tar
 
 
  Here some UDFs which can be matched to oracle functions:
  There are two kinds of oracle functions. one is scalar function and
 another is analytic function.
  Most scalar functions in Oracle can be converted to hive's udf directly.
  Oracle Scalar Function
  GenericUDFDecode : Compares first argument to each other value one by
 one. e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
  GenericUDFGreatest : Return the greatest of the list of one or more
 expressions. e.g., GREATEST(2,5,12,3) will return 12
  GenericUDFInstr : Return the location of a substring in a string. e.g.,
 INSTR('next', 'e') will return 2
  GenericUDFLnnvl : Evaluate a condition when one or both operands of the
 condition may be null. e.g., LNNVL(2  4) will return true
  GenericUDFNVL : Replace null with a string in the results of a query.
 e.g., NVL(null,'hive') will return hive
  GenericUDFNVL2 : Determine the value returned by a query based on
 whether a specified expression is null or not null. e.g., NVL2(null,'not
 null','null value') will return 'null value'
  GenericUDFToNumber : Convert a string to a number. e.g.,
 TO_NUMBER('112','999') will return 112
  GenericUDFTrunc : Returns a date truncated to a specific unit of
 measure. e.g., TRUNC('2002-11-02 01:01:01','') will return '2002-01-01
 00:00:00'
  Oracle Analytic Function
  Most analytic functions in Oracle can't be converted to hive's query and
 udf directly.
  Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive
 to support analytic functions
  e.q., SELECT _FUNC_(hash(col1), col2, ...) FROM SELECT ~ FROM table
 DISTRIBUTED BY hash(col1) SORT BY col1, col2 ...
  GenericUDFSum : Calculate a cumulative sum.
  GenericUDFRank : Assign a sequential order, or rank within some group
 based on key.
  GenericUDFDenseRank : Act like RANK function except that it assigns
 consecutive ranks.
  GenericUDFRowNumber : Return sequence integer value within some group
 based on key.
  GenericUDFMax : Determine the highest value within some group based on
 key.
  GenericUDFMin : Determine the lowest value within some group based on
 key.
  GenericUDFLag : Access data from a previous row.
  This udfs was developed with hive-pdk

 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA
 administrators
 For more information on JIRA, see: http://www.atlassian.com/software/jira



[jira] [Updated] (HIVE-2991) Integrate Clover with Hive

2013-02-14 Thread Ilya Katsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Katsov updated HIVE-2991:
--

Attachment: hive.2991.3.branch-0.9.patch
hive.2991.3.branch-0.10.patch

 Integrate Clover with Hive
 --

 Key: HIVE-2991
 URL: https://issues.apache.org/jira/browse/HIVE-2991
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, 
 hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, 
 hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, 
 hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, 
 hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, 
 hive.2991.3.trunk.patch, hive-trunk-clover-html-report.zip


 Atlassian has donated license of their code coverage tool Clover to ASF. Lets 
 make use of it to generate code coverage report to figure out which areas of 
 Hive are well tested and which ones are not. More information about license 
 can be found in Hadoop jira HADOOP-1718 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2991) Integrate Clover with Hive

2013-02-14 Thread Ilya Katsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Katsov updated HIVE-2991:
--

Attachment: hive.2991.3.trunk.patch

 Integrate Clover with Hive
 --

 Key: HIVE-2991
 URL: https://issues.apache.org/jira/browse/HIVE-2991
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, 
 hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, 
 hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, 
 hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, 
 hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, 
 hive.2991.3.trunk.patch, hive-trunk-clover-html-report.zip


 Atlassian has donated license of their code coverage tool Clover to ASF. Lets 
 make use of it to generate code coverage report to figure out which areas of 
 Hive are well tested and which ones are not. More information about license 
 can be found in Hadoop jira HADOOP-1718 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3911) udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled.

2013-02-14 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578423#comment-13578423
 ] 

Rohini Palaniswamy commented on HIVE-3911:
--

The problem is actually with java. The double precision differs based on the 
order in which the numbers are summed up. 
https://issues.apache.org/jira/browse/PIG-2484?focusedCommentId=13430431page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13430431

 udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is 
 disabled.
 -

 Key: HIVE-3911
 URL: https://issues.apache.org/jira/browse/HIVE-3911
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Thiruvel Thirumoolan
 Fix For: 0.11.0

 Attachments: HIVE-3911.patch


 I am running Hive10 unit tests against Hadoop 0.23.5 and 
 udaf_percentile_approx.q fails with a different value when map-side aggr is 
 disabled and only when 3rd argument to this UDAF is 100. Matches expected 
 output when map-side aggr is enabled for the same arguments.
 This test passes when hadoop.version is 1.1.1 and fails when its 0.23.x or 
 2.0.0-alpha or 2.0.2-alpha.
 [junit] 20c20
 [junit]  254.083331
 [junit] ---
 [junit]  252.77
 [junit] 47c47
 [junit]  254.083331
 [junit] ---
 [junit]  252.77
 [junit] 74c74
 [junit]  
 [23.358,254.083331,477.0625,489.54667]
 [junit] ---
 [junit]  [24.07,252.77,476.9,487.82]
 [junit] 101c101
 [junit]  
 [23.358,254.083331,477.0625,489.54667]
 [junit] ---
 [junit]  [24.07,252.77,476.9,487.82]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1970 - Still Failing

2013-02-14 Thread Apache Jenkins Server
Changes for Build #1964
[namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility
(Navis via namit)


Changes for Build #1965

Changes for Build #1966

Changes for Build #1967

Changes for Build #1968
[kevinwilfong] HIVE-3252. Add environment context to metastore Thrift calls. 
(Samuel Yuan via kevinwilfong)


Changes for Build #1969

Changes for Build #1970



No tests ran.

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1970)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1970/ to 
view the results.

[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-14 Thread Michael Malak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578523#comment-13578523
 ] 

Michael Malak commented on HIVE-3528:
-

I've tried the latest Avro SerDe from GitHub, and it allows me to write NULLs 
with a simple schema, but not with anything involving STRUCTs.  If a STRUCT 
contains NULLable fields, or the entire STRUCT is NULLable (optional), Hive 
throws exceptions.  Should I create new JIRA item(s)?

 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-14 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578534#comment-13578534
 ] 

Sean Busbey commented on HIVE-3528:
---

Hi Michael,

Do you mean [Haivvreo|https://github.com/jghoman/haivvreo]? I don't believe 
this fix has been backported. Issues with that project should be filed [against 
its issue tracker|https://github.com/jghoman/haivvreo/issues]. I believe that 
project is only maintained on a best effort basis now, in favor of the 
integrated support within Hive.

 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-14 Thread Michael Malak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578538#comment-13578538
 ] 

Michael Malak commented on HIVE-3528:
-

Sean:

I mean
https://github.com/apache/hive/tree/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro

which has:
AvroSerializer.java 20 days ago HIVE-3528 : Avro SerDe doesn't 
handle serializing Nullable types that


 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-14 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578545#comment-13578545
 ] 

Sean Busbey commented on HIVE-3528:
---

In that case, since 0.11 hasn't gone out yet, either including error 
information here or starting another JIRA is fine.

I'd recommend including the error information here until we can determine if 
your problem is the one that this patch is supposed to fix.

 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4010) Failure finding iterate method with matching signature

2013-02-14 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578578#comment-13578578
 ] 

Phabricator commented on HIVE-4010:
---

mshang has added reviewers to the revision HIVE-4010 [jira] Failure finding 
iterate method with matching signature.
Added Reviewers: njain

  ping

REVISION DETAIL
  https://reviews.facebook.net/D8517

To: JIRA, jonchang, kevinwilfong, njain, mshang


 Failure finding iterate method with matching signature
 --

 Key: HIVE-4010
 URL: https://issues.apache.org/jira/browse/HIVE-4010
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Miles Shang
Assignee: Miles Shang
Priority: Minor
 Attachments: HIVE-4010.D8517.1.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 http://fburl.com/10467687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-14 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578587#comment-13578587
 ] 

Phabricator commented on HIVE-3874:
---

kevinwilfong has commented on the revision HIVE-3874 [jira] Create a new 
Optimized Row Columnar file format for Hive.

  A couple of minor style comments, according to the style guide 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConvention
 :

  There are a number of places in the code where you are missing spaces around 
+ operators (e.g. line 58 in DynamicByteArray), you're missing a space between 
for and ( (e.g. line 63 in DynamicByteArray), and you're missing a space before 
a : in a for-each loop (e.g. line 191 in OrcStruct).

  Mentioning these now as I don't want them to hold up a commit later.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/orc/OrcInputFormat.java:149-151 Is this 
loop necessary?  result is a boolean array so all of these entries will default 
to false anyway
  ql/src/java/org/apache/hadoop/hive/ql/orc/OutStream.java:136-140 I'm a little 
confused by this, if compressed is null, why aren't you initializing overflow 
as well?
  ql/src/java/org/apache/hadoop/hive/ql/orc/OrcStruct.java:307 I saw issues 
with this, and TypeInfoUtils expecting array instead of list.
  ql/src/java/org/apache/hadoop/hive/ql/orc/WriterImpl.java:561-562 As far as I 
can tell, by storing the intermediate string data in these structures which do 
not write to a stream until writeStripe is called, the size of string columns 
is not being accounted for at all when determining whether or not to write out 
the stripe.  (This could be fixed as a follow up)

REVISION DETAIL
  https://reviews.facebook.net/D8529

To: JIRA, omalley
Cc: kevinwilfong


 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
 OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive

2013-02-14 Thread Jingwei Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingwei Lu updated HIVE-3672:
-

Attachment: HIVE-3672.6.patch.txt

Refreshed on 2/14. 

 Support altering partition column type in Hive
 --

 Key: HIVE-3672
 URL: https://issues.apache.org/jira/browse/HIVE-3672
 Project: Hive
  Issue Type: Improvement
  Components: CLI, SQL
Reporter: Jingwei Lu
Assignee: Jingwei Lu
  Labels: features
 Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
 HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
 HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt

   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, Hive does not allow altering partition column types.  As we've 
 discouraged users from using non-string partition column types, this presents 
 a problem for users who want to change there partition columns to be strings, 
 they have to rename their table, create a new table, and copy all the data 
 over.
 To support this via the CLI, adding a command like ALTER TABLE table_name 
 PARTITION COLUMN (column_name new type);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive

2013-02-14 Thread Jingwei Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingwei Lu updated HIVE-3672:
-

Status: Patch Available  (was: Open)

Refresh and merged on 2/14

 Support altering partition column type in Hive
 --

 Key: HIVE-3672
 URL: https://issues.apache.org/jira/browse/HIVE-3672
 Project: Hive
  Issue Type: Improvement
  Components: CLI, SQL
Reporter: Jingwei Lu
Assignee: Jingwei Lu
  Labels: features
 Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, 
 HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, 
 HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt

   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, Hive does not allow altering partition column types.  As we've 
 discouraged users from using non-string partition column types, this presents 
 a problem for users who want to change there partition columns to be strings, 
 they have to rename their table, create a new table, and copy all the data 
 over.
 To support this via the CLI, adding a command like ALTER TABLE table_name 
 PARTITION COLUMN (column_name new type);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4021) PostgreSQL upgrade scripts are creating column with incorrect name

2013-02-14 Thread Jarek Jarcec Cecho (JIRA)
Jarek Jarcec Cecho created HIVE-4021:


 Summary: PostgreSQL upgrade scripts are creating column with 
incorrect name
 Key: HIVE-4021
 URL: https://issues.apache.org/jira/browse/HIVE-4021
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Trivial


I've noticed that PostgreSQL upgrade scripts are creating table 
{{PART_COL_STATS}} and {{TAB_COL_STATS}} with column {{DOUBLE_HIGH_VALUES}}, 
however hive (and all other scripts) are expecting column name 
{{DOUBLE_HIGH_VALUE}} (without the S at the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect name

2013-02-14 Thread Jarek Cecho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9456/
---

Review request for hive.


Description
---

I've fixed the column name to singular form as the code and other scripts are 
expecting.


This addresses bug HIVE-4021.
https://issues.apache.org/jira/browse/HIVE-4021


Diffs
-

  /trunk/metastore/scripts/upgrade/postgres/012-HIVE-1362.postgres.sql 1446320 

Diff: https://reviews.apache.org/r/9456/diff/


Testing
---


Thanks,

Jarek Cecho



[jira] [Updated] (HIVE-4021) PostgreSQL upgrade scripts are creating column with incorrect name

2013-02-14 Thread Jarek Jarcec Cecho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated HIVE-4021:
-

Attachment: bugHIVE-4021.patch

 PostgreSQL upgrade scripts are creating column with incorrect name
 --

 Key: HIVE-4021
 URL: https://issues.apache.org/jira/browse/HIVE-4021
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Trivial
 Attachments: bugHIVE-4021.patch


 I've noticed that PostgreSQL upgrade scripts are creating table 
 {{PART_COL_STATS}} and {{TAB_COL_STATS}} with column {{DOUBLE_HIGH_VALUES}}, 
 however hive (and all other scripts) are expecting column name 
 {{DOUBLE_HIGH_VALUE}} (without the S at the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4021) PostgreSQL upgrade scripts are creating column with incorrect name

2013-02-14 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578697#comment-13578697
 ] 

Jarek Jarcec Cecho commented on HIVE-4021:
--

Might ask someone with sufficient privilegies to add me into the contributor 
group on Jira? It seems that I still can't assign the ticket to myself.

 PostgreSQL upgrade scripts are creating column with incorrect name
 --

 Key: HIVE-4021
 URL: https://issues.apache.org/jira/browse/HIVE-4021
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Trivial
 Attachments: bugHIVE-4021.patch


 I've noticed that PostgreSQL upgrade scripts are creating table 
 {{PART_COL_STATS}} and {{TAB_COL_STATS}} with column {{DOUBLE_HIGH_VALUES}}, 
 however hive (and all other scripts) are expecting column name 
 {{DOUBLE_HIGH_VALUE}} (without the S at the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4010) Failure finding iterate method with matching signature

2013-02-14 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4010:
--

Attachment: HIVE-4010.D8517.2.patch

mshang updated the revision HIVE-4010 [jira] Failure finding iterate method 
with matching signature.

  More informative comments.

Reviewers: JIRA, jonchang, kevinwilfong, njain

REVISION DETAIL
  https://reviews.facebook.net/D8517

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8517?vs=27597id=2#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBridge.java
  ql/src/test/org/apache/hadoop/hive/ql/udf/UDAFTestMethodOverloading.java
  ql/src/test/queries/clientpositive/create_udaf_overload.q
  ql/src/test/results/clientpositive/create_udaf_overload.q.out
  serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java

To: JIRA, jonchang, kevinwilfong, njain, mshang


 Failure finding iterate method with matching signature
 --

 Key: HIVE-4010
 URL: https://issues.apache.org/jira/browse/HIVE-4010
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Miles Shang
Assignee: Miles Shang
Priority: Minor
 Attachments: HIVE-4010.D8517.1.patch, HIVE-4010.D8517.2.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 http://fburl.com/10467687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4021) PostgreSQL upgrade scripts are creating column with incorrect name

2013-02-14 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578705#comment-13578705
 ] 

Shreepadma Venugopalan commented on HIVE-4021:
--

Looks good, +1.

 PostgreSQL upgrade scripts are creating column with incorrect name
 --

 Key: HIVE-4021
 URL: https://issues.apache.org/jira/browse/HIVE-4021
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Trivial
 Attachments: bugHIVE-4021.patch


 I've noticed that PostgreSQL upgrade scripts are creating table 
 {{PART_COL_STATS}} and {{TAB_COL_STATS}} with column {{DOUBLE_HIGH_VALUES}}, 
 however hive (and all other scripts) are expecting column name 
 {{DOUBLE_HIGH_VALUE}} (without the S at the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4000) Hive client goes into infinite loop at 100% cpu

2013-02-14 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578762#comment-13578762
 ] 

Phabricator commented on HIVE-4000:
---

ashutoshc has accepted the revision HIVE-4000 [jira] Hive client goes into 
infinite loop at 100% cpu.

  +1 Running tests.

REVISION DETAIL
  https://reviews.facebook.net/D8493

BRANCH
  hive-4000

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, omalley
Cc: brock


 Hive client goes into infinite loop at 100% cpu
 ---

 Key: HIVE-4000
 URL: https://issues.apache.org/jira/browse/HIVE-4000
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.10.1

 Attachments: HIVE-4000.D8493.1.patch


 The Hive client starts multiple threads to track the progress of the 
 MapReduce jobs. Unfortunately those threads access several static HashMaps 
 that are not protected by locks. When the HashMaps are modified, they 
 sometimes cause race conditions that lead to the client threads getting stuck 
 in infinite loops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2013-02-14 Thread Michael Malak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578783#comment-13578783
 ] 

Michael Malak commented on HIVE-3528:
-

Sean:

OK, I've researched the problem further.

There is in fact a null-struct test case in line 14 of
https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

The test script at
https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

does indeed work when I tested it locally.  But in that test, the query gets 
all of its data from a test table verbatim:

INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

If instead we stick in a hard-coded null for the struct directly into the 
query, it fails:

INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, 
bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, 
bytes1, fixed1 FROM test_serializer;

with the following error:

FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target 
table because column number/types are different 'as_avro': Cannot convert 
column 10 from void to structsint:int,sboolean:boolean,sstring:string.

Note, though, that substituting a hard-coded null for string1 (and restoring 
struct1 to the query) does work:

INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, 
boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, 
fixed1 FROM test_serializer;

I will be entering an all-new JIRA for this.


 Avro SerDe doesn't handle serializing Nullable types that require access to a 
 Schema
 

 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: avro
 Fix For: 0.11.0

 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt


 Deserialization properly handles hiding Nullable Avro types, including 
 complex types like record, map, array, etc. However, when Serialization 
 attempts to write out these types it erroneously makes use of the UNION 
 schema that contains NULL and the other type.
 This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
 Bytes.
 Here's a [review board of unit tests that express the 
 problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
 case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4022) Avro SerDe queries don't handle hard-coded nulls for optional/nullable structs

2013-02-14 Thread Michael Malak (JIRA)
Michael Malak created HIVE-4022:
---

 Summary: Avro SerDe queries don't handle hard-coded nulls for 
optional/nullable structs
 Key: HIVE-4022
 URL: https://issues.apache.org/jira/browse/HIVE-4022
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Michael Malak


Related to HIVE-3528,

There is in fact a null-struct test case in line 14 of
https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

The test script at
https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

does indeed work.  But in that test, the query gets all of its data from a test 
table verbatim:

INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

If instead we stick in a hard-coded null for the struct directly into the 
query, it fails:

INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, 
bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, 
bytes1, fixed1 FROM test_serializer;

with the following error:

FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target 
table because column number/types are different 'as_avro': Cannot convert 
column 10 from void to structsint:int,sboolean:boolean,sstring:string.

Note, though, that substituting a hard-coded null for string1 (and restoring 
struct1 into the query) does work:

INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, 
boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, 
fixed1 FROM test_serializer;


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby

2013-02-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2340:


Status: Patch Available  (was: Open)

Passed all tests

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
 HIVE-2340.13.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, 
 HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-14 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578829#comment-13578829
 ] 

Phabricator commented on HIVE-3874:
---

kevinwilfong has commented on the revision HIVE-3874 [jira] Create a new 
Optimized Row Columnar file format for Hive.

  lso

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldReader.java:18 The package 
name doesn't match the directory structure.  This doesn't seem to be causing 
the build to fail, but in Eclipse it shows up as an error.  Could you adjust 
either the package name or the directory structure so they match.

REVISION DETAIL
  https://reviews.facebook.net/D8529

To: JIRA, omalley
Cc: kevinwilfong


 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
 OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-14 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578830#comment-13578830
 ] 

Phabricator commented on HIVE-3874:
---

kevinwilfong has commented on the revision HIVE-3874 [jira] Create a new 
Optimized Row Columnar file format for Hive.

  *Ignore the lso

REVISION DETAIL
  https://reviews.facebook.net/D8529

To: JIRA, omalley
Cc: kevinwilfong


 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
 OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4023) Improve Error Logging in MetaStore

2013-02-14 Thread Bhushan Mandhani (JIRA)
Bhushan Mandhani created HIVE-4023:
--

 Summary: Improve Error Logging in MetaStore
 Key: HIVE-4023
 URL: https://issues.apache.org/jira/browse/HIVE-4023
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Trivial


The RetryingHMSHandler should log the entire stack trace before throwing an 
exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2670) A cluster test utility for Hive

2013-02-14 Thread Johnny Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang updated HIVE-2670:
---

Status: Patch Available  (was: Open)

[~alangates], please let me know what do you think about the latest patch. 
thanks.

 A cluster test utility for Hive
 ---

 Key: HIVE-2670
 URL: https://issues.apache.org/jira/browse/HIVE-2670
 Project: Hive
  Issue Type: New Feature
  Components: Testing Infrastructure
Reporter: Alan Gates
Assignee: Johnny Zhang
 Attachments: harness.tar, hive_cluster_test_2.patch, 
 hive_cluster_test_3.patch, hive_cluster_test_4.patch, hive_cluster_test.patch


 Hive has an extensive set of unit tests, but it does not have an 
 infrastructure for testing in a cluster environment.  Pig and HCatalog have 
 been using a test harness for cluster testing for some time.  We have written 
 Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4023) Improve Error Logging in MetaStore

2013-02-14 Thread Bhushan Mandhani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhushan Mandhani updated HIVE-4023:
---

Attachment: HIVE-4023.1.patch.txt

 Improve Error Logging in MetaStore
 --

 Key: HIVE-4023
 URL: https://issues.apache.org/jira/browse/HIVE-4023
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Trivial
 Attachments: HIVE-4023.1.patch.txt


 The RetryingHMSHandler should log the entire stack trace before throwing an 
 exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4023) Improve Error Logging in MetaStore

2013-02-14 Thread Bhushan Mandhani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhushan Mandhani updated HIVE-4023:
---

Status: Patch Available  (was: Open)

 Improve Error Logging in MetaStore
 --

 Key: HIVE-4023
 URL: https://issues.apache.org/jira/browse/HIVE-4023
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Trivial
 Attachments: HIVE-4023.1.patch.txt


 The RetryingHMSHandler should log the entire stack trace before throwing an 
 exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3831) Add Command to Turn Sorting Off for a Bucketed Table

2013-02-14 Thread Bhushan Mandhani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhushan Mandhani resolved HIVE-3831.


Resolution: Duplicate

 Add Command to Turn Sorting Off for a Bucketed Table
 

 Key: HIVE-3831
 URL: https://issues.apache.org/jira/browse/HIVE-3831
 Project: Hive
  Issue Type: Bug
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Minor

 If we have specified a bucketed table as sorted on some columns, there is no 
 Hive command to turn the sorting off for that table. There are scenarios 
 where we need to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3716) Create Table Like should support TableProperties

2013-02-14 Thread Bhushan Mandhani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhushan Mandhani resolved HIVE-3716.


Resolution: Duplicate

Fixed by Kevin Wilfong in another jira.

 Create Table Like should support TableProperties
 

 Key: HIVE-3716
 URL: https://issues.apache.org/jira/browse/HIVE-3716
 Project: Hive
  Issue Type: New Feature
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Trivial

 Create Table Like currently doesn't allow the specification of 
 TableProperties for the created table. It will be useful to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-hadoop2 - Build # 122 - Still Failing

2013-02-14 Thread Apache Jenkins Server
Changes for Build #84
[cws] HIVE-3931. Add Oracle metastore upgrade script for 0.9 to 10.0
 (Prasad Mujumdar via cws)


Changes for Build #85

Changes for Build #86
[hashutosh] HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin 
via Ashutosh Chauhan)

[hashutosh] HIVE-3833 : object inspectors should be initialized based on 
partition metadata (Namit Jain via Ashutosh Chauhan)


Changes for Build #87

Changes for Build #88
[namit] HIVE-3825 Add Operator level Hooks
(Pamela Vagata via namit)

[hashutosh] HIVE-3528 : Avro SerDe doesn't handle serializing Nullable types 
that require access to a Schema (Sean Busbey via Ashutosh Chauhan)

[namit] HIVE-3943 Skewed query fails if hdfs path has special characters
(Gang Tim Liu via namit)


Changes for Build #89
[namit] HIVE-3527 Allow CREATE TABLE LIKE command to take TBLPROPERTIES
(Kevin Wilfong via namit)

[namit] HIVE-3944 Make accept qfile argument for miniMR tests
(Navis via namit)


Changes for Build #90
[namit] HIVE-3912 table_access_keys_stats.q fails with hadoop 0.23
(Sushanth Sownyan via namit)

[namit] HIVE-3921 recursive_dir.q fails on 0.23
(Sushanth Sowmyan via namit)

[namit] HIVE-3923 join_filters_overlap.q fails on 0.23
(Sushanth Sowmyan via namit)

[namit] HIVE-3924 join_nullsafe.q fails on 0.23
(Sushanth Sownyan via namit)

[hashutosh] Adding csv.txt file, left out from commit of 3528


Changes for Build #91

Changes for Build #92
[hashutosh] HIVE-3799 : Better error message if metalisteners or hookContext 
cannot be loaded/instantiated (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-3947 : MiniMR test remains pending after test completion 
(Navis via Ashutosh Chauhan)


Changes for Build #93

Changes for Build #94
[kevinwilfong] HIVE-3903. Allow updating bucketing/sorting metadata of a 
partition through the CLI. (Samuel Yuan via kevinwilfong)


Changes for Build #95
[namit] HIVE-3873 lot of tests failing for hadoop 23
(Gang Tim Liu via namit)


Changes for Build #96
[hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784

[hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan)


Changes for Build #97
[namit] HIVE-933 Infer bucketing/sorting properties
(Kevin Wilfong via namit)

[hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh 
Chauhan, Reviewed by Namit Jain)


Changes for Build #98

Changes for Build #99
[kevinwilfong] HIVE-3940. Track columns accessed in each table in a query. 
(Samuel Yuan via kevinwilfong)


Changes for Build #100
[namit] HIVE-3778 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
(Gang Tim Liu via namit)


Changes for Build #101

Changes for Build #102

Changes for Build #103

Changes for Build #104
[hashutosh] HIVE-3977 : Hive 0.10 postgres schema script is broken (Johnny 
Zhang via Ashutosh Chauhan)

[hashutosh] HIVE-3932 : Hive release tarballs don't contain PostgreSQL 
metastore scripts (Mark Grover via Ashutosh Chauhan)


Changes for Build #105
[hashutosh] HIVE-3918 : Normalize more CRLF line endings (Mark Grover via 
Ashutosh Chauhan)

[namit] HIVE-3917 Support noscan operation for analyze command
(Gang Tim Liu via namit)


Changes for Build #106
[namit] HIVE-3937 Hive Profiler
(Pamela Vagata via namit)

[hashutosh] HIVE-3571 : add a way to run a small unit quickly (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-3956 : TestMetaStoreAuthorization always uses the same port 
(Navis via Ashutosh Chauhan)


Changes for Build #107

Changes for Build #108

Changes for Build #109

Changes for Build #110
[namit] HIVE-2839 Filters on outer join with mapjoin hint is not applied 
correctly
(Navis via namit)


Changes for Build #111

Changes for Build #112
[namit] HIVE-3998 Oracle metastore update script will fail when upgrading from 
0.9.0 to
0.10.0 (Jarek and Mark via namit)

[namit] HIVE-3999 Mysql metastore upgrade script will end up with different 
schema than
the full schema load (Jarek and Mark via namit)


Changes for Build #113

Changes for Build #114
[namit] HIVE-3995 PostgreSQL upgrade scripts are not valid
(Jarek and Mark via namit)


Changes for Build #115

Changes for Build #116
[namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility
(Navis via namit)


Changes for Build #117

Changes for Build #118

Changes for Build #119

Changes for Build #120
[kevinwilfong] HIVE-3252. Add environment context to metastore Thrift calls. 
(Samuel Yuan via kevinwilfong)


Changes for Build #121

Changes for Build #122



32 tests failed.
REGRESSION:  
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap1

Error Message:
Unexpected exception See build/ql/tmp/hive.log, or try ant test ... 
-Dtest.silent=false to get more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get 
more logs.
at junit.framework.Assert.fail(Assert.java:50)
at 

[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-14 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3874:


Status: Open  (was: Patch Available)

 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, 
 OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-948) more query plan optimization rules

2013-02-14 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578920#comment-13578920
 ] 

Phabricator commented on HIVE-948:
--

navis has commented on the revision HIVE-948 [jira] more query plan 
optimization rules.

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java:75 
I don't know well about the issue, but there is still a rule(R7:MAPJOIN%) about 
mapjoin in genMapRedTasks() method in SemanticAnalyzer. what's that?

  I'll remove that.
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CleanupProcessor.java:47 ok, 
NonBlockingOpDeDupProc will be good. When we need some other cleaning ups, we 
can rename it or do other things.

REVISION DETAIL
  https://reviews.facebook.net/D8463

BRANCH
  DPAL-1980

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, navis


 more query plan optimization rules 
 ---

 Key: HIVE-948
 URL: https://issues.apache.org/jira/browse/HIVE-948
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Navis
 Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch


 Many query plans are not optimal in that they contain redundant operators. 
 Some examples are unnecessary select operators (select followed by select, 
 select output being the same as input etc.). Even though these operators are 
 not very expensive, they could account for around 10% of CPU time in some 
 simple queries. It seems they are low-hanging fruits that we should pick 
 first. 
 BTW, it seems these optimization rules should be added at the last stage of 
 the physical optimization phase since some redundant operators are added to 
 facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-948) more query plan optimization rules

2013-02-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-948:
---

Attachment: HIVE-948.D8463.3.patch

It contains test changes and too big for the Phabricator. 

 more query plan optimization rules 
 ---

 Key: HIVE-948
 URL: https://issues.apache.org/jira/browse/HIVE-948
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Navis
 Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, 
 HIVE-948.D8463.3.patch


 Many query plans are not optimal in that they contain redundant operators. 
 Some examples are unnecessary select operators (select followed by select, 
 select output being the same as input etc.). Even though these operators are 
 not very expensive, they could account for around 10% of CPU time in some 
 simple queries. It seems they are low-hanging fruits that we should pick 
 first. 
 BTW, it seems these optimization rules should be added at the last stage of 
 the physical optimization phase since some redundant operators are added to 
 facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4000) Hive client goes into infinite loop at 100% cpu

2013-02-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578922#comment-13578922
 ] 

Ashutosh Chauhan commented on HIVE-4000:


Many tests are resulting in NPE because we switched to ConcurrentHashMaps which 
don't allow null keys (as oppose to HashMaps). Stacktrace:
{noformat}
] java.lang.NullPointerException
[junit] at 
java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:782)
[junit] at 
java.util.Collections$SetFromMap.contains(Collections.java:3574)
[junit] at 
org.apache.hadoop.hive.ql.QueryPlan.extractCounters(QueryPlan.java:364)
[junit] at 
org.apache.hadoop.hive.ql.QueryPlan.getQueryPlan(QueryPlan.java:444)
[junit] at 
org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:617)
[junit] at 
org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:503)
[junit] at 
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:147)
[junit] at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
[junit] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1343)
[junit] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1129)
[junit] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:940)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
[junit] at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
[junit] at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774)
[junit] at 
org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:5759)
[junit] at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_drop(TestCliDriver.java:1923)
{noformat}

Seems like we can fix this by modifying QueryPlan.java:364 to:
{code}
 if (task.getId() != null  started.contains(task.getId())  
done.contains(task.getId())) {
continue;
  }
{code}

 Hive client goes into infinite loop at 100% cpu
 ---

 Key: HIVE-4000
 URL: https://issues.apache.org/jira/browse/HIVE-4000
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.10.1

 Attachments: HIVE-4000.D8493.1.patch


 The Hive client starts multiple threads to track the progress of the 
 MapReduce jobs. Unfortunately those threads access several static HashMaps 
 that are not protected by locks. When the HashMaps are modified, they 
 sometimes cause race conditions that lead to the client threads getting stuck 
 in infinite loops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-948) more query plan optimization rules

2013-02-14 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-948:
-

Attachment: HIVE-948.D8463.3.patch

navis updated the revision HIVE-948 [jira] more query plan optimization rules.

  Addressed comments

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D8463

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8463?vs=27603id=27807#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java

To: JIRA, ashutoshc, navis


 more query plan optimization rules 
 ---

 Key: HIVE-948
 URL: https://issues.apache.org/jira/browse/HIVE-948
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Navis
 Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, 
 HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch


 Many query plans are not optimal in that they contain redundant operators. 
 Some examples are unnecessary select operators (select followed by select, 
 select output being the same as input etc.). Even though these operators are 
 not very expensive, they could account for around 10% of CPU time in some 
 simple queries. It seems they are low-hanging fruits that we should pick 
 first. 
 BTW, it seems these optimization rules should be added at the last stage of 
 the physical optimization phase since some redundant operators are added to 
 facilitate physical plan generation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4000) Hive client goes into infinite loop at 100% cpu

2013-02-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4000:
---

Status: Open  (was: Patch Available)

Also, Owen can you add a sample query and scenario where this problem will show 
up.

 Hive client goes into infinite loop at 100% cpu
 ---

 Key: HIVE-4000
 URL: https://issues.apache.org/jira/browse/HIVE-4000
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.10.1

 Attachments: HIVE-4000.D8493.1.patch


 The Hive client starts multiple threads to track the progress of the 
 MapReduce jobs. Unfortunately those threads access several static HashMaps 
 that are not protected by locks. When the HashMaps are modified, they 
 sometimes cause race conditions that lead to the client threads getting stuck 
 in infinite loops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4024) Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0

2013-02-14 Thread Jarek Jarcec Cecho (JIRA)
Jarek Jarcec Cecho created HIVE-4024:


 Summary: Derby metastore update script will fail when upgrading 
from 0.9.0 to 0.10.0
 Key: HIVE-4024
 URL: https://issues.apache.org/jira/browse/HIVE-4024
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Minor


The problem is in following file {{011-HIVE-3649.derby.sql}} that contains 
following line:

{code}
ALTER TABLE SDS ADD  IS_STOREDASSUBDIRECTORIES CHAR(1) NOT NULL;
{code}

This query will however fail if the table SDS have at least one row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-4024 Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0

2013-02-14 Thread Jarek Cecho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9462/
---

Review request for hive.


Description
---

I've provided similar solution as in HIVE-3995, HIVE-3998 and HIVE-3999.


This addresses bug HIVE-4024.
https://issues.apache.org/jira/browse/HIVE-4024


Diffs
-

  /trunk/metastore/scripts/upgrade/derby/011-HIVE-3649.derby.sql 1443292 

Diff: https://reviews.apache.org/r/9462/diff/


Testing
---

I've tested the upgrade procedure.


Thanks,

Jarek Cecho



[jira] [Updated] (HIVE-4024) Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0

2013-02-14 Thread Jarek Jarcec Cecho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated HIVE-4024:
-

Attachment: bugHIVE-4024.patch

 Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0
 ---

 Key: HIVE-4024
 URL: https://issues.apache.org/jira/browse/HIVE-4024
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Minor
 Attachments: bugHIVE-4024.patch


 The problem is in following file {{011-HIVE-3649.derby.sql}} that contains 
 following line:
 {code}
 ALTER TABLE SDS ADD  IS_STOREDASSUBDIRECTORIES CHAR(1) NOT NULL;
 {code}
 This query will however fail if the table SDS have at least one row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4024) Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0

2013-02-14 Thread Jarek Jarcec Cecho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated HIVE-4024:
-

Status: Patch Available  (was: Open)

 Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0
 ---

 Key: HIVE-4024
 URL: https://issues.apache.org/jira/browse/HIVE-4024
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Minor
 Attachments: bugHIVE-4024.patch


 The problem is in following file {{011-HIVE-3649.derby.sql}} that contains 
 following line:
 {code}
 ALTER TABLE SDS ADD  IS_STOREDASSUBDIRECTORIES CHAR(1) NOT NULL;
 {code}
 This query will however fail if the table SDS have at least one row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-201) fetch task appears as a root task in explain plan

2013-02-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis resolved HIVE-201.


Resolution: Duplicate

 fetch task appears as a root task in explain plan
 -

 Key: HIVE-201
 URL: https://issues.apache.org/jira/browse/HIVE-201
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain

 fetch task appears as a root task in explain plan. that should be changed so 
 that fetch task depends on the execution task appropriately

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HIVE-201) fetch task appears as a root task in explain plan

2013-02-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reopened HIVE-201:



Sorry, mistakenly pushed resolve.

 fetch task appears as a root task in explain plan
 -

 Key: HIVE-201
 URL: https://issues.apache.org/jira/browse/HIVE-201
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain

 fetch task appears as a root task in explain plan. that should be changed so 
 that fetch task depends on the execution task appropriately

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-1035) limit can be optimized if the limit is happening on the reducer

2013-02-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis resolved HIVE-1035.
-

Resolution: Duplicate

Applied option-1 in HIVE-3550

 limit can be optimized if the limit is happening on the reducer
 ---

 Key: HIVE-1035
 URL: https://issues.apache.org/jira/browse/HIVE-1035
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain

 A query like:
 select ... from A join B..  limit 10;
 where the limit is performed on the reducer can be further optimized.
 Currently, all the operators on the reduce side will be done, but the 
 ExecReducer will un-necessarily deserialize all the rows.
 The following optimizations can be done:
 1. Do nothing in reduce() in ExecReducer.
 2. Modify map-reduce framework so that it does not even invoke the reduce() 
 method in ExecReducer.
 2. may require some work from hadoop - but we should minimally do 1. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3997) Use distributed cache to cache/localize dimension table filter it in map task setup

2013-02-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13579018#comment-13579018
 ] 

Gopal V commented on HIVE-3997:
---

All map-tasks happening in a single wave, but some of the hashtable generation 
before the map-side task is taking 2x the time it took on the client node.

This is probably because of CPU starvation on the map-task because of too many 
parallel tasks - couldn't find a way to tune down the map count per node from 
12 (as it is doing now) because the NodeManager does not seem to have a tunable 
for it (?).

 Use distributed cache to cache/localize dimension table  filter it in map 
 task setup
 -

 Key: HIVE-3997
 URL: https://issues.apache.org/jira/browse/HIVE-3997
 Project: Hive
  Issue Type: Improvement
Reporter: Gopal V
Assignee: Gopal V

 The hive clients are not always co-located with the hadoop/hdfs cluster.
 This means that the dimension table filtering, when done on the client side 
 becomes very slow. Not only that, the conversion of the small tables into 
 hashtables has to be done every single time a query is run with different 
 filters on the big table.
 That entire hashtable has to be part of the job, which involves even more 
 HDFS writes from the far client side.
 Using the distributed cache also has the advantage that the localized files 
 can be kept between jobs instead of firing off an HDFS read for every query.
 Moving the operator pipeline for the hash generation into the map task itself 
 has perhaps a few cons.
 The map task might OOM due to this change, but it will take longer to recover 
 until all the map attempts fail, instead of being conditional on the client. 
 The client has no idea how much memory the hashtable needs and has to rely on 
 the disk sizes (compressed sizes, perhaps) to determine if it needs to fall 
 back onto a reduce-join instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira