Changes for Build #261

Changes for Build #262
[daijy] HIVE-4591: Making changes to webhcat-site.xml have no effect

[daijy] HIVE-4784:ant testreport doesn't include any HCatalog tests


Changes for Build #263
[omalley] Undoing the accidental update of the 0.11 release notes.

[omalley] HIVE-4478. In ORC remove ispresent stream from columns that contain 
no null 
values in a stripe. (Prasanth Jayachandran via omalley)


Changes for Build #264

Changes for Build #265
[hashutosh] HIVE-4740 : HIVE2379 is missing hbase.jar itself (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-4756 : Upgrade Hadoop 0.23 profile to 2.0.5-alpha (Brock 
Noland via Ashutosh Chauhan)

[hashutosh] HIVE-4791 : improve test coverage of package 
org.apache.hadoop.hive.ql.udf.xml (Ivan Veselovsky via Ashutosh Chauhan)


Changes for Build #266
[hashutosh] HIVE-4618 : show create table creating unusable DDL when field 
delimiter is \001 (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-4559 : hcatalog/webhcat scripts in tar.gz don't have execute 
permissions set (Eugene Koifman via Ashutosh Chauhan)

[hashutosh] HIVE-4798 : NPE when we call isSame from an instance of 
ExprNodeConstantDesc with null value (Yin Huai via Ashutosh Chauhan)

[hashutosh] HIVE-4781 : LEFT SEMI JOIN generates wrong results when the number 
of rows belonging to a single key of the right table exceed 
hive.join.emit.interval (Yin Huai via Ashutosh Chauhan)

[hashutosh] HIVE-4647 : RetryingHMSHandler logs too many error messages (Navis 
via Ashutosh Chauhan)

[hashutosh] HIVE-4692 : Constant agg parameters will be replaced by 
ExprNodeColumnDesc with single-sourced multi-gby cases (Navis via Ashutosh 
Chauhan)


Changes for Build #267
[hashutosh] HIVE-4781 : Adding new data files for tests. Missed in original 
commit.


Changes for Build #268
[navis] HIVE-2517 : Support group by on struct type (Ashutosh Chauhan via Navis)

[hashutosh] HIVE-4406 : Missing / or /<dbname> in hs2 jdbc uri switches mode to 
embedded mode(Anandha Ranganathan via Ashutosh Chauhan)

[hashutosh] HIVE-4430 : Semantic analysis fails in presence of certain literals 
in on clause (Kevin Wilfong via Ashutosh Chauhan)

[hashutosh] HIVE-4757 : LazyTimestamp goes into irretrievable NULL mode once 
inited with NULL once (Gopal V via Ashutosh Chauhan)

[hashutosh] HIVE-4785 : Implement isCaseSensitive for Hive JDBC driver (Robert 
Roland via Ashutosh Chauhan)


Changes for Build #269
[navis] HIVE-4436 : hive.exec.parallel=true doesn't work on hadoop-2
 (Gopal V via Navis)


Changes for Build #270
[hashutosh] HIVE-4689 : For outerjoins, joinEmitInterval might make wrong 
result (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-3253 : ArrayIndexOutOfBounds exception for deeply nested 
structs (Thejas Nair via Ashutosh Chauhan)


Changes for Build #271

Changes for Build #272

Changes for Build #273
[hashutosh] HIVE-4089 : javax.jdo : jdo2-api dependency not in Maven Central 
(Navis via Ashutosh Chauhan)

[ecapriolo] HIVE-4804 parallel order by fails for small datasets (Navis via egc)

Submitted by:   Navis
Reviewed by:    Edward Capriolo


Changes for Build #274

Changes for Build #275
[hashutosh] HIVE-4811 : (Slightly) break up the SemanticAnalyzer monstrosity 
(Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-4814 : Adjust WebHCat e2e tests until HIVE4703 is addressed 
(Eugene Koifman via Ashutosh Chauhan)


Changes for Build #276
[hashutosh] HIVE-4251 : Indices can't be built on tables whose schema info 
comes from SerDe (Mark Wagner via Ashutosh Chauhan)

[hashutosh] HIVE-4805 : Enhance coverage of package 
org.apache.hadoop.hive.ql.exec.errors (Ivan Veselovsky via Ashutosh Chauhan)


Changes for Build #277
[hashutosh] HIVE-4733 : HiveLockObjectData is not compared properly (Navis via 
Ashutosh Chauhan)

[ecapriolo] HIVE-3475 INLINE UDTF does not convert types properly (Igor Kabiljo 
and Navis Ryu via egc)

Submitted by:   Navis Ryu and Igor Kabiljo
Reviewed by:    Edward Capriolo


Changes for Build #278
[hashutosh] HIVE-4802 : Fix url check for missing / or /<db> after hostname in 
jdb uri (Thejas Nair via Ashutosh Chauhan)


Changes for Build #279
[hashutosh] HIVE-3810 : HiveHistory.log need to replace \r with space before 
writing Entry.value to historyfile (Mark Grover via Ashutosh Chauhan)


Changes for Build #280
[hashutosh] HIVE-4819 : Comments in CommonJoinOperator for aliasTag is not 
valid (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-4813 : Improve test coverage of package 
org.apache.hadoop.hive.ql.optimizer.pcr (Ivan Veselovsky via Ashutosh Chauhan)

[hashutosh] HIVE-4580 : Change DDLTask to report errors using canonical error 
messages rather than http status codes (Eugene Koifman via Ashutosh Chauhan)

[hashutosh] HIVE-4796 : Increase coverage of package 
org.apache.hadoop.hive.common.metrics (Ivan Veselovsky via Ashutosh Chauhan)

[navis] HIVE-4812 : Logical explain plan (Gunther Hagleitner V via Navis)


Changes for Build #281
[hashutosh] HIVE-4833 : Fix eclipse template classpath to include the correct 
jdo lib (Yin Huai via Ashutosh Chauhan)

[hashutosh] HIVE-4830 : Test clientnegative/nested_complex_neg.q got broken due 
to 4580 (Vikram Dixit via Ashutosh Chauhan)

[hashutosh] HIVE-4810 [jira] Refactor exec package
(Gunther Hagleitner via Ashutosh Chauhan)

Summary:
HIVE-4810

The exec package contains both operators and classes used to execute the job. 
Moving the latter into a sub package makes the package slightly more manageable 
and will make it easier to provide a tez-based implementation.

Test Plan: Refactoring

Reviewers: ashutoshc

Reviewed By: ashutoshc

Differential Revision: https://reviews.facebook.net/D11625

[hashutosh] HIVE-4829 : TestWebHCatE2e checkstyle violation causes all tests to 
fail (Eugene Koifman via Ashutosh Chauhan)


Changes for Build #282
[hashutosh] HIVE-3691 : TestDynamicSerDe failed with IBM JDK (Bing Li & Renata 
Ghisloti via Ashutosh Chauhan)

[hashutosh] HIVE-4807 : Hive metastore hangs (Sarvesh Sakalanaga via Ashutosh 
Chauhan)


Changes for Build #283

Changes for Build #284

Changes for Build #285
[hashutosh] HIVE-4840 : Fix eclipse template classpath to include the BoneCP 
lib (Yin Huai via Ashutosh Chauhan)


Changes for Build #286
[navis] HIVE-4290 : Build profiles: Partial builds for quicker dev (Gunther 
Hagleitner via Navis)

[navis] HIVE-4658 : Make KW_OUTER optional in outer joins (Edward Capriolo via 
Navis)


Changes for Build #287

Changes for Build #288

Changes for Build #289
[hashutosh] HIVE-4852 : -Dbuild.profile=core fails (Gunther Hagleitner via 
Ashutosh Chauhan)

[hashutosh] HIVE-4854 : testCliDriver_load_hdfs_file_with_space_in_the_name 
fails on hadoop 2 (Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-4853 : junit timeout needs to be updated (Gunther Hagleitner 
via Ashutosh Chauhan)

[hashutosh] HIVE-4721 : Fix TestCliDriver.ptf_npath.q on 0.23 (Gunther 
Hagleitner via Ashutosh Chauhan)


Changes for Build #290
[ecapriolo] HIVE-3603 Enable client-side caching for scans on HBase (Navis Ryu 
via EGC)

Submitted by:   Navis Ryu
Reviewed by:    Edward Capriolo


Changes for Build #291
[hashutosh] HIVE-4845 : Correctness issue with MapJoins using the null safe 
operator (Brock Noland via Ashutosh Chauhan)


Changes for Build #292
[daijy] HIVE-4820 : webhcat_config.sh should set default values for HIVE_HOME 
and HCAT_PREFIX that work with default build tree structure (Eugene Koifman via 
Jianyong Dai)


Changes for Build #293
[brock] HIVE-4865 - HiveLockObjects: Unlocking retries/times out when query 
contains ":" (Gunther Hagleitner via Brock Noland)


Changes for Build #294
[hashutosh] HIVE-2206 [jira] add a new optimizer for query correlation 
discovery and optimization
(Yin Huai via Ashutosh Chauhan)

Summary:
update test results

This issue proposes a new logical optimizer called Correlation Optimizer, which 
is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The 
idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and 
slides of YSmart are linked at the bottom.

Since Hive translates queries in a sentence by sentence fashion, for every 
operation which may need to shuffle the data (e.g. join and aggregation 
operations), Hive will generate a MapReduce job for that operation. However, 
for those operations which may need to shuffle the data, they may involve 
correlations explained below and thus can be executed in a single MR job.

        Input Correlation: Multiple MR jobs have input correlation (IC) if 
their input relation sets are not disjoint;
        Transit Correlation: Multiple MR jobs have transit correlation (TC) if 
they have not only input correlation, but also the same partition key;
        Job Flow Correlation: An MR has job flow correlation (JFC) with one of 
its child nodes if it has the same partition key as that child node.

The current implementation of correlation optimizer only detect correlations 
among MR jobs for reduce-side join operators and reduce-side aggregation 
operators (not map only aggregation). A query will be optimized if it satisfies 
following conditions.

        There exists a MR job for reduce-side join operator or reduce side 
aggregation operator which have JFC with all of its parents MR jobs (TCs will 
be also exploited if JFC exists);
        All input tables of those correlated MR job are original input tables 
(not intermediate tables generated by sub-queries); and
        No self join is involved in those correlated MR jobs.

Correlation optimizer is implemented as a logical optimizer. The main reasons 
are that it only needs to manipulate the query plan tree and it can leverage 
the existing component on generating MR jobs.

Current implementation can serve as a framework for correlation related 
optimizations. I think that it is better than adding individual optimizers.

There are several work that can be done in future to improve this optimizer. 
Here are three examples.

        Support queries only involve TC;
        Support queries in which input tables of correlated MR jobs involves 
intermediate tables; and
        Optimize queries involving self join.

References:
Paper and presentation of YSmart.
Paper: 
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
Slides: http://sdrv.ms/UpwJJc

Test Plan: EMPTY

Reviewers: JIRA, ashutoshc

Reviewed By: ashutoshc

CC: brock

Differential Revision: https://reviews.facebook.net/D11097

[ecapriolo] HIVE-4873 Sort candidate functions in case of UDFArgumentException 
(Xuefu Zhang via egc)

Submitted by:   Xuefu Zhang
Reviewed by:    Edward Capriolo


Changes for Build #295
[ecapriolo] HIVE-4675 Create new parallel unit test environment (Brock Noland 
via egc)

Submitted by: Brock Noland      
Reviewed by: Edward Capriolo




No tests ran.

The Apache Jenkins build system has built Hive-trunk-hadoop2 (build #295)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-hadoop2/295/ 
to view the results.

Reply via email to