GitHub user marmbrus opened a pull request:
https://github.com/apache/spark/pull/146
SPARK-1251 Support for optimizing and executing structured queries
This pull request adds support to Spark for working with structured data
using a simple SQL dialect, HiveQL and a Scala Query DSL.
*This is being contributed as a new __alpha component__ to Spark and does
not modify Spark core or other components.*
The code is broken into three primary components:
- Catalyst (sql/catalyst) - An implementation-agnostic framework for
manipulating trees of relational operators and expressions.
- Execution (sql/core) - A query planner / execution engine for
translating Catalystâs logical query plans into Spark RDDs. This component
also includes a new public interface, SqlContext, that allows users to execute
SQL or structured scala queries against existing RDDs and Parquet files.
- Hive Metastore Support (sql/hive) - An extension of SqlContext called
HiveContext that allows users to write queries using a subset of HiveQL and
access data from a Hive Metastore using Hive SerDes. There are also wrappers
that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs.
A more complete design of this new component can be found in [the
associated JIRA](https://spark-project.atlassian.net/browse/SPARK-1251).
[An updated version of the Spark documentation, including API Docs for all
three
sub-components,](http://www.cs.berkeley.edu/~marmbrus/sparkdocs/_site/sql-programming-guide.html)
is also available for review.
With this PR comes support for inferring the schema of existing RDDs that
contain case classes. Using this information, developers can now express
structured queries that are automatically compiled into RDD operations.
```scala
// Define the schema using a case class.
case class Person(name: String, age: String)
val people: RDD[Person] =
sc.textFile("people.txt").map(_.split(",")).map(p => Person(p(0),
p(1).toInt))
// The following is the same as 'SELECT name FROM people WHERE age >= 10 &&
age <= 19'
val teenagers = people.where('age >= 10).where('age <=
19).select('name).toRdd
```
RDDs can also be registered as Tables, allowing SQL queries to be written
over them.
```scala
people.registerAsTable("people")
val teenagers = sql("SELECT name FROM people WHERE age >= 10 && age <= 19")
```
The results of queries are themselves RDDs and support standard RDD
operations:
```scala
teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
```
Finally, with the optional Hive support, users can read and write data
located in existing Apache Hive deployments using HiveQL.
```scala
sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sql("LOAD DATA LOCAL INPATH 'src/main/resources/kv1.txt' INTO TABLE src")
// Queries are expressed in HiveQL
sql("SELECT key, value FROM src").collect().foreach(println)
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/marmbrus/spark catalyst
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/146.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #146
----
commit 5dab0bc9e94b7b81d03e8b2bc22a72897a907d37
Author: Michael Armbrust <[email protected]>
Date: 2014-01-28T23:31:07Z
Merge pull request #26 from liancheng/serdeAndPartitionPruning
Hive SerDe support and partition pruning optimization
commit 677eb073f635815a2aa22a49ed466b84c785d6ed
Author: Michael Armbrust <[email protected]>
Date: 2014-01-29T00:14:18Z
Update test whitelist.
commit d4f539a9a7c0210b609e68a0fa49b1d2922b1205
Author: Michael Armbrust <[email protected]>
Date: 2014-01-29T04:15:38Z
blacklist mr and user specific tests.
commit 4c89d6ea16c4de05d45a8336ef3808d96cc3abe4
Author: Reynold Xin <[email protected]>
Date: 2014-01-29T04:43:31Z
Merge pull request #27 from marmbrus/moreTests
Update test whitelist.
commit ebb56faaec54c970fa49e8c575facfd6658e37ea
Author: Michael Armbrust <[email protected]>
Date: 2014-01-29T06:27:35Z
add travis config
commit 8ee41be08034e1a66ec13a0ed66a1b59a3ad0aaa
Author: Lian, Cheng <[email protected]>
Date: 2014-01-30T14:38:45Z
Minor refactoring
commit 2486fb71dc89f915c4f54a95e42211c79fc99e4c
Author: Lian, Cheng <[email protected]>
Date: 2014-01-30T14:39:00Z
Fixed spelling
commit 61e729cc21afcafe64af1befee2efb54271bf6d8
Author: Lian, Cheng <[email protected]>
Date: 2014-01-30T14:39:37Z
Added ColumnPrunings strategy and test cases
commit 605255eb979416edc19c005f0bc7b8d5f13dd44b
Author: Reynold Xin <[email protected]>
Date: 2014-01-30T22:55:06Z
Added scalastyle checker.
commit 08e4d0589056f3ae6e117689596420bbf7fbbbc2
Author: Reynold Xin <[email protected]>
Date: 2014-01-30T23:59:55Z
First round of style cleanup.
commit 7213a2c466d7e30cabb2a2fd07bc81a8d7e36cfe
Author: Reynold Xin <[email protected]>
Date: 2014-01-31T00:14:32Z
style fix for Hive.scala.
commit 5c1e60043c4b60529936f93a1536d021f28a2460
Author: Reynold Xin <[email protected]>
Date: 2014-01-31T00:18:55Z
Added hash code implementation for AttributeReference
commit 7e24436da3de67e3b33c310d0c761b2c8e3d11bd
Author: Reynold Xin <[email protected]>
Date: 2014-01-31T00:34:59Z
Removed dependency on JDK 7 (nio.file).
commit 41bbee67d888f8773a1b02ecc5abd957cda033ee
Author: Yin Huai <[email protected]>
Date: 2014-01-31T05:31:15Z
Merge remote-tracking branch 'upstream/master' into exchangeOperator
Conflicts:
build.sbt
src/main/scala/catalyst/execution/SharkInstance.scala
commit f47c2f6f3572cb15da916c0efab7839e485ec905
Author: Yin Huai <[email protected]>
Date: 2014-01-31T06:32:00Z
set outputPartitioning in BroadcastNestedLoopJoin
commit d91e276fb303a878bb54ba156a3087c204f0e167
Author: Michael Armbrust <[email protected]>
Date: 2014-01-31T21:40:59Z
Remove dependence on HIVE_HOME for running tests. This was done by moving
all the hive query test (from branch-0.12) and data files into src/test/hive.
These are used by default when HIVE_HOME is not set.
commit bce024d4a4d7bd8ef3443dcf9dcd367afeaf1837
Author: Michael Armbrust <[email protected]>
Date: 2014-01-31T22:54:10Z
Merge remote-tracking branch 'databricks/master' into style
Disable if brace checking as it errors in single line functional cases
unlike the style guide.
Conflicts:
src/main/scala/catalyst/execution/TestShark.scala
commit d20b565a36533245d0357b18332e8c8658821a2e
Author: Michael Armbrust <[email protected]>
Date: 2014-01-31T23:10:04Z
fix if style
commit 807b2d7ce15ef78f73acfe4950a8fd14b6784545
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T00:03:46Z
check style and publish docs with travis
commit d3a3d48d6ad2aa3562b0859f2af13dd8d8b75fd7
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T00:12:33Z
add testing to travis
commit 271e483d65dc41a4feb6f9f4018379094c4ff0bf
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T00:28:47Z
Update build status icon.
[no ci]
commit 6015f932176c291556e13d0e08abd42ad8fdddab
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T00:38:19Z
Merge pull request #29 from rxin/style
Scala style checker & style fixes
commit fc67b5078c23c88b6387cf2b948d84a99cc87e08
Author: Yin Huai <[email protected]>
Date: 2014-02-01T00:46:18Z
Check for a Sort operator with the global flag set instead of an Exchange
operator with a RangePartitioning.
commit 235cbb436756cfeb915fe1864b66277c067b5abd
Author: Yin Huai <[email protected]>
Date: 2014-02-01T00:57:14Z
Merge remote-tracking branch 'upstream/master' into exchangeOperator
Conflicts:
src/main/scala/catalyst/execution/aggregates.scala
src/main/scala/catalyst/expressions/Evaluate.scala
commit 45b334b4d06d254c3b9a8f03b2e64f14b48a3c88
Author: Yin Huai <[email protected]>
Date: 2014-02-01T01:11:07Z
fix comments
commit e079f2b32d3391bdfe835ca66dde7eaedf5df5c0
Author: Timothy Chen <[email protected]>
Date: 2014-01-16T06:53:00Z
Add GenericUDAF wrapper and HiveUDAFFunction
commit 8e0931f1ca55aff597132c6a27ed058866680db5
Author: Michael Armbrust <[email protected]>
Date: 2014-01-28T22:15:03Z
Cast to avoid using deprecated hive API.
commit b1151a8a13b6a3cd1dfa53115b67610955112d66
Author: Timothy Chen <[email protected]>
Date: 2014-01-29T17:58:26Z
Fix load data regex
commit 5b7afd8f7b2f77f3e97b94228fee6f6b92c858be
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T19:57:06Z
Merge pull request #10 from yhuai/exchangeOperator
Exchange operator
commit 6eb59608a17ace6a39638a1fdf24241403642578
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T20:09:02Z
Merge remote-tracking branch 'databricks/master' into udafs
Conflicts:
src/main/scala/catalyst/execution/aggregates.scala
commit 41b41f3c6ff0b06e6ac76a6a17c929c3bae8be8a
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T09:39:11Z
Only cast unresolved inserts.
commit 63003e90fb70e13d22ad7e260e29897286a7776b
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T20:37:58Z
Fix spacing.
commit 2de89d0807307f0944d79fb525d18bc2464ebf49
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T20:38:18Z
Merge pull request #13 from tnachen/master
Add GenericUDAF wrapper and HiveUDAFFunction
commit cb775ac99241f26461a19646b9c6db660a6a2eeb
Author: Michael Armbrust <[email protected]>
Date: 2014-01-12T22:15:44Z
get rid of SharkContext singleton
commit dfb67aa73ce15d9a9c355afaa1d690b3aad41843
Author: Michael Armbrust <[email protected]>
Date: 2014-01-13T01:47:55Z
add test case
commit 19bfd74f9b7a3cc9dc7b7cc6477908abbd6826d9
Author: Michael Armbrust <[email protected]>
Date: 2014-01-22T07:08:31Z
store hive output in circular buffer
commit 1590568ddbeee565bc483ccfe089b287433643a4
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T01:57:48Z
add log4j.properties
commit b649c20a124ef2e7cd8c026ffb06be759d608cec
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T05:13:30Z
fix test logging / caching.
commit 784536466cc3fe69ea230f0e63f7c4cd670fdadc
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T05:13:40Z
deactivate concurrent test.
commit ea6f37f740a5dfef3ca0c2f82e4c26ed3171851c
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T05:13:53Z
fix style.
commit 82163e3e3c21804898e576e3a224e3a644e75d27
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T06:26:58Z
special case handling of partitionKeys when casting insert into tables
commit 9c22b4ebdda3955a88800dcf0dec0d14748394e7
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T02:13:44Z
Support for parsing nested types.
commit efa72170ebe27d84cb5ae2efeaed4054ceca1f9c
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T02:19:31Z
Support for reading structs in HiveTableScan.
commit d670e41dfaf93bc322079d5e93b938c2f868932c
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T02:19:47Z
Print nested fields like hive does.
commit dc6463acaccfbdf3bae41ca746b678cb3b70cf9a
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T02:20:11Z
Support for resolving access to nested fields using "." notation.
commit 67094413d86c0d03fbb717a99916b9c906552d67
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T02:20:26Z
Evaluation for accessing nested fields.
commit da7ae9da830a5260478a5d9cd4959bb5f3565df2
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T02:21:11Z
Add boolean writable that was breaking udf_regexp test. Not sure how this
was passing before...
commit 6420c7c23b1fcbae009ce97c5dd2dc9ece75f0a0
Author: Michael Armbrust <[email protected]>
Date: 2014-02-01T02:28:56Z
Memoize the ordinal in the GetField expression.
commit 1579eecca917152c542a68149eddd636131dbb2f
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T09:39:11Z
Only cast unresolved inserts.
commit cf8d99257ad87063bca4bc3a2d5a09b54a2cf2b1
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T20:00:51Z
Use built in functions for creating temp directory.
commit c654f19ef6fec54537a4e704234b63c65c7e0d1e
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T20:01:51Z
Support for list and maps in hive table scan.
commit c3feda75938565b85ff401aeb29bdcb44e7accdc
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T20:02:06Z
use toArray.
commit a9388fb7274fe40b9d10eb8d4a3c97c32d365187
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T20:02:29Z
printing for map types.
commit bbec500c4fc9a12cbc18b607147aa751308f4288
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T20:02:52Z
update test coverage, new golden
commit 35a70fbfd93b83856f86ea52bc1b3a850076960f
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T21:28:05Z
multi-letter field names.
commit 2c6deb37b104b5272d99917b6933a749da99d06e
Author: Michael Armbrust <[email protected]>
Date: 2014-02-02T21:28:23Z
improve printing compatibility.
commit 5b33216d197ad7c649e36f9f9a2a48143120aeae
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T00:21:23Z
work on decimal support.
commit 5b3d2c80546848a9c6bf830c22ec5f029dca790f
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T00:21:40Z
implement distinct.
commit 3f9e519a16f9dc9f3eabda3ad91d80c088e3f384
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T00:30:14Z
use names w/ boolean args
commit 3734a9416c1156030a7c2af9e43d9209ca17aa59
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T00:31:03Z
only quote string types.
commit 5e54aa6dab3e3ed0f2e702abc038eee5f17fcb38
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T01:38:52Z
quotes for struct field names.
commit e4def6b2c917ebf28b3a11fc1aad690c2fddd55f
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T01:39:19Z
set dataType for HiveGenericUdfs.
commit aa430e7ba7fd748619bd4b1959ca165ec2b13a5c
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T06:58:51Z
Update .travis.yml
commit 7661b6ce6b8cb1cfc816e87d0644cfc063dce921
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T07:21:24Z
blacklist machines specific tests
commit 72a003dd3dce58331205465fb43bbb9a412156c4
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T07:41:45Z
revert regex change
commit 9c0677866e24293525602a8e76860b4785950c39
Author: Michael Armbrust <[email protected]>
Date: 2014-02-03T08:11:21Z
fix serialization issues, add JavaStringObjectInspector.
commit 92e415878439ceb94e3d41de75bc26acfe92a24d
Author: Reynold Xin <[email protected]>
Date: 2014-02-03T18:30:55Z
Merge pull request #32 from marmbrus/tooManyProjects
Fix a bug in PreInsertionCasts rule.
commit 692a4779af0a269ae1f16006ab129c00af2a6c5c
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T02:36:48Z
Support for wrapping arrays to be written into hive tables.
commit ac9d7de4f973d4809d435d098def4de12c1c0dbc
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T02:37:06Z
Resolve *s in Transform clauses.
commit 7a0f543431b196f78da2f473fd2f0d3e3764d0c3
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T02:37:21Z
Avoid propagating types from unresolved nodes.
commit 010accb872f179b97b6cc6e971a7e9f17ec2de73
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T02:37:39Z
add tinyint to metastore type parser.
commit e7933e912356e686ce36cc8a52dc813a7cc8c430
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T02:38:13Z
fix casting bug when working with fractional expressions.
commit 25288d055a0bcf251e64c8653442f1ee5b466e70
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T02:38:38Z
Implement [] for arrays and maps.
commit ab9a131818884dd2258174956fdca65bd14dfd42
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T02:38:58Z
when UDFs fail they should return null.
commit 1679554ae68dfc91212ebaf8401efaf6088d61a9
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T02:39:12Z
add toString for if and IS NOT NULL.
commit ab5bff387f2ced791527b4c20b2c30dc7da6c190
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T02:39:28Z
Support for get item of map types.
commit 42ec4af79020a5952bf59a5e44d6852eef5d4b41
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T03:07:17Z
improve complex type support in hive udfs/udafs.
commit 44d343ca60aa1fbcd78217a39ea86a74098e0ef3
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T03:09:38Z
Merge remote-tracking branch 'databricks/master' into complex
Conflicts:
src/main/scala/catalyst/analysis/Analyzer.scala
commit e3c10bd5649658995c3a347ebe1ab434fad50cdc
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T08:57:55Z
update whitelist.
commit 389525dedbc7c6c83d6686a7661c98354f60425e
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T18:44:35Z
update golden, blacklist mr.
commit 2f276049070ccd873368441e652c0d6a2d3e2551
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T19:23:12Z
Address comments / style errors.
commit cb57459ce009bdf8e58e7eaf1c301279b5a07ce7
Author: Michael Armbrust <[email protected]>
Date: 2014-02-04T19:24:33Z
blacklist machine specific test.
commit 67128b8bf07a5deaacd1a9214c1fa58d0bfcba85
Author: Reynold Xin <[email protected]>
Date: 2014-02-04T21:16:20Z
Merge pull request #30 from marmbrus/complex
Initial support for reading / accessing / printing nested fields.
commit b4be6a5411cd3d25919bc71563da44638660ecb6
Author: Michael Armbrust <[email protected]>
Date: 2014-02-05T00:53:46Z
better logging when applying rules.
commit ccdb07a18c62c7c955400e3253d81adbd6e8f42e
Author: Michael Armbrust <[email protected]>
Date: 2014-02-05T00:54:23Z
Fix bug where averages of strings are turned into sums of strings. Remove
a blank line.
commit d8cb805193f7d8ffe96efc423bb86f781ea3ef41
Author: Michael Armbrust <[email protected]>
Date: 2014-02-05T01:50:48Z
Implement partial aggregation.
commit f94345cb0ed64b8566da623e765a04cac6739733
Author: Michael Armbrust <[email protected]>
Date: 2014-02-05T02:44:38Z
fix doc link
commit e1999f927a41eae4a9affe2728296a1a9ee06cb8
Author: Yin Huai <[email protected]>
Date: 2014-02-05T04:38:11Z
Use Deserializer and Serializer instead of AbstractSerDe.
commit 32b615b52e7c202b29e1242952092d09f3332745
Author: Michael Armbrust <[email protected]>
Date: 2014-02-05T09:36:12Z
add override to asPartial.
commit 883006dd16cbd1ddb61f164ad28a8237f4c6becc
Author: Michael Armbrust <[email protected]>
Date: 2014-02-05T09:46:39Z
improve tests.
commit cab1a84b4811064fe217b0cd56d3fe9c48210b6a
Author: Michael Armbrust <[email protected]>
Date: 2014-02-05T10:01:08Z
Fix PartialAggregate inheritance.
commit dc6353be64bfe9c6522403a5a4124423cd62e22b
Author: Michael Armbrust <[email protected]>
Date: 2014-02-05T10:03:58Z
turn off deprecation
commit 8017afb101b214635dcd1b372afcd21379c340f5
Author: Michael Armbrust <[email protected]>
Date: 2014-02-05T18:40:23Z
fix copy paste error.
commit 5479066a011a8dff6da8c68c8452cdeffb4cc3e8
Author: Reynold Xin <[email protected]>
Date: 2014-02-05T19:22:52Z
Merge pull request #36 from marmbrus/partialAgg
Implement partial aggregation.
commit 5e4d9b453658dece7afa987ab9b07bf2c12b4999
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T00:16:19Z
Merge pull request #35 from marmbrus/smallFixes
A few small bug fixes and improvements.
commit 02ff8e4462793d8f37365f44cb2f269f619d72da
Author: Yin Huai <[email protected]>
Date: 2014-02-07T13:41:42Z
Correctly parse the db name and table name in a CTAS query.
commit 8841eb888d16edbb1bd34175ee13b664468e78b7
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T22:01:51Z
Rename Transform -> ScriptTransformation.
commit acb956646de2a05475ff5086b5967e0e657f8aa0
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T22:03:56Z
Correctly type attributes of CTAS.
commit 016b48990ef37b32d1bd4b1d4790afbe15e7db57
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T22:04:17Z
fix typo.
commit bea4b7f1c3b091386bb8cacad8f8c2e154c579b7
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T22:04:40Z
Add SumDistinct.
commit ea76cf9bf5e07dfa5435fa99ae1e0623a7c89262
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T22:05:13Z
Add NoRelation to planner.
commit dd00b7e8df7356be40379ec560f2f476f74e1a8e
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T22:11:33Z
initial implementation of generators.
commit ba8897fd60a6555d2a52ea5fb3d8c32981ed2296
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T22:12:16Z
Merge remote-tracking branch 'yin/parseDBNameInCTAS' into lateralView
commit 0ce61b0f3d110567693bb340df6f5bdd6ee41a2c
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T22:44:19Z
Docs for GenericHiveUdtf.
commit 740febb71c94e40f436cb3ea5ebc81b0cda4db26
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T22:44:33Z
Tests for tgfs.
commit db92adc5ff5a0712d5104aad00cad67b520070b4
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T23:58:28Z
more tests passing. clean up logging.
commit ff5ea3f209eed028365a2b680dd7093340e355c8
Author: Michael Armbrust <[email protected]>
Date: 2014-02-07T23:59:41Z
new golden
commit 5cc367cdb9946b092c53ff1473ac3f784c0112d3
Author: Michael Armbrust <[email protected]>
Date: 2014-02-08T01:34:34Z
use berkeley instead of cloudbees
commit b376d15652bd0372d1713429468d874614a9dd7a
Author: Michael Armbrust <[email protected]>
Date: 2014-02-08T01:42:32Z
fix newlines at EOF
commit 7123225ae5e96dc7be38b13c2f2bcc86a19249ad
Author: Yin Huai <[email protected]>
Date: 2014-02-08T01:44:01Z
Correctly parse the db name and table name in INSERT queries.
commit 2897deb146c498bfc7ebcb80e3835ecb9899cfeb
Author: Michael Armbrust <[email protected]>
Date: 2014-02-08T02:31:20Z
fix scaladoc
commit 0e6c1d712f95ce0268dc71b28a64c2bd29c81b27
Author: Reynold Xin <[email protected]>
Date: 2014-02-08T06:40:54Z
Merge pull request #38 from yhuai/parseDBNameInCTAS
Correctly parse the db name and table name of a table
commit 341116cb450ff72af793a5bd84d73ca2203200cb
Author: Michael Armbrust <[email protected]>
Date: 2014-02-08T20:09:59Z
address comments.
commit 7785ee62e47c93390213ff3f1a8a67a293d878a6
Author: Michael Armbrust <[email protected]>
Date: 2014-02-10T23:14:49Z
Tighten visibility based on comments.
commit 964368f3b21c79ec86eb7c0389c43768fb4c1b01
Author: Michael Armbrust <[email protected]>
Date: 2014-02-11T00:04:01Z
Merge pull request #39 from marmbrus/lateralView
Add support for lateral views, TGFs and Hive UDTFs
commit dce0593034a30b802d9be2cf98590e9955df1b47
Author: Michael Armbrust <[email protected]>
Date: 2014-02-11T00:04:56Z
move golden answer to the source code directory.
commit 9329820a9a85697a9bfad11b6f7266c07eb59235
Author: Michael Armbrust <[email protected]>
Date: 2014-02-11T00:28:23Z
add golden answer files to repository
commit a7ad05855a376af7c7cdb89bb114cccba9e6b9b1
Author: Michael Armbrust <[email protected]>
Date: 2014-02-11T02:02:05Z
Merge pull request #40 from marmbrus/includeGoldens
Include golden hive answers in the source repository
commit 2407a21180d261138454d23926786dcc20e88d1e
Author: Lian, Cheng <[email protected]>
Date: 2014-02-12T00:29:11Z
Added optimized logical plan to debugging output
commit cf691df0b020840be8bfaf0e29a7db4ef049b6f6
Author: Lian, Cheng <[email protected]>
Date: 2014-02-12T00:30:14Z
Added the PhysicalOperation to generalize ColumnPrunings
commit f235914e3572919f5cb056b8a6794eb0623f5617
Author: Lian, Cheng <[email protected]>
Date: 2014-02-12T09:14:22Z
Test case udf_regex and udf_like need BooleanWritable registered
commit f0c3742583d9a99bfc0f36c4fe9e2a497412c580
Author: Lian, Cheng <[email protected]>
Date: 2014-02-12T09:23:07Z
Refactored PhysicalOperation
The old version is implemented in a top down tail recursive manner, which
cannot cover an uncommon corner case like:
Filter (with aliases)
Project ...
MetastoreRelation
In this case, the aliases are not in-lined/substituted because no aliases
are collected yet. It is now covered by the new version which is implemented
in a bottom up recursive manner and collects all necessary aliases before
in-lining/substitution.
commit 5720d2bd2cd08c2ecbff32391ed88080cecd7359
Author: Lian, Cheng <[email protected]>
Date: 2014-02-12T09:39:09Z
Fixed comment typo
commit bc9a12ce63f14f34aa9d74086f3485a6d338cf66
Author: Michael Armbrust <[email protected]>
Date: 2014-02-13T23:18:26Z
Move hive test files.
commit 7588a57feb1870c718be645e428d1f2371b9e722
Author: Michael Armbrust <[email protected]>
Date: 2014-02-13T23:19:28Z
Break into 3 major components and move everything into the
org.apache.spark.sql package.
commit 1f7d00aab0b9bd56dd4e4b71c9979f9e4e559d8b
Author: Reynold Xin <[email protected]>
Date: 2014-02-14T06:29:29Z
Merge pull request #41 from marmbrus/splitComponents
Break catalyst into 3 major components and move everything into
org.apache.spark.sql
commit 887f928aac6f649ed5f97c644dafd715a9b450a4
Author: Yin Huai <[email protected]>
Date: 2014-02-14T10:38:57Z
Merge remote-tracking branch 'upstream/master' into SerDeNew
commit 678341a50b793b09658b823fa1bdc61a9293d770
Author: Mark Hamstra <[email protected]>
Date: 2014-02-14T18:21:24Z
Replaced non-ascii text
commit 5ae010ff20ed811962e6f13920d1ef43bfc2a14b
Author: Michael Armbrust <[email protected]>
Date: 2014-02-14T19:14:33Z
Merge pull request #42 from markhamstra/non-ascii
Replaced non-ascii text
commit 1f6260d77223aaf23c2bbb112b52803bea061e42
Author: Lian, Cheng <[email protected]>
Date: 2014-02-14T20:45:29Z
Fixed package name and test suite name in Makefile
commit b6de691f13d66dadc7b72c9eb19acccaf75b8ee9
Author: Michael Armbrust <[email protected]>
Date: 2014-02-14T22:15:35Z
Merge pull request #43 from liancheng/fixMakefile
Fixed package name and test suite name in Makefile
commit 7f206b5aa577bc4ca8aeb82d2438ad43316eb996
Author: Michael Armbrust <[email protected]>
Date: 2014-02-14T22:34:23Z
Add support for hive TABLESAMPLE PERCENT.
commit ed3a1d15b80768817e9259e31499df53587c51b2
Author: Yin Huai <[email protected]>
Date: 2014-02-14T23:45:32Z
Load data directly into Hive.
commit 59e37a31efba400649685c4cedf648d1b0c86d0b
Author: Yin Huai <[email protected]>
Date: 2014-02-14T23:56:06Z
Merge remote-tracking branch 'upstream/master' into SerDeNew
Conflicts:
build.sbt
shark/src/main/scala/org/apache/spark/sql/shark/HiveMetastoreCatalog.scala
commit 346f828dc37df3a1681e6ebf2a5940a609ead50a
Author: Yin Huai <[email protected]>
Date: 2014-02-15T00:38:52Z
Move SharkHadoopWriter to the correct location.
commit a9c318853d4bb02965252810656999be060682dd
Author: Timothy Chen <[email protected]>
Date: 2014-02-15T01:06:00Z
Fix udaf struct return
commit 69adf7298edb74a9ecd704932276d988d1c8ba5d
Author: Yin Huai <[email protected]>
Date: 2014-02-15T01:22:13Z
Set cloneRecords to false.
commit 566fd6685fec88b88223f4b47af04eb39a69d28e
Author: Timothy Chen <[email protected]>
Date: 2014-02-15T02:09:30Z
Whitelist tests and add support for Binary type
commit 9ad474d877ae1a6dcc6a7769c2effed4c3a15029
Author: Michael Armbrust <[email protected]>
Date: 2014-02-15T02:56:30Z
Merge pull request #44 from marmbrus/sampling
Add support for hive TABLESAMPLE PERCENT.
commit 3cb4f2e16662c54806474d0de2fbd9021133ae08
Author: Michael Armbrust <[email protected]>
Date: 2014-02-15T02:57:29Z
Merge pull request #45 from tnachen/master
Fix udaf struct return
commit 8506c176f7e18011df50e25f8ea98d30a57f0ccd
Author: Michael Armbrust <[email protected]>
Date: 2014-02-15T03:20:41Z
Address review feedback.
commit 3bb272ddc69472120bb0915308451576565cecf6
Author: Michael Armbrust <[email protected]>
Date: 2014-02-15T03:26:42Z
move org.apache.spark.sql package.scala to the correct location.
commit 1596e1b14e8e2741758c6370bb29d32830476a7f
Author: Yin Huai <[email protected]>
Date: 2014-02-15T04:09:25Z
Cleanup imports to make IntelliJ happy.
commit 5495faba864ee7ef1f8649bca02eacb7479a3b2a
Author: Yin Huai <[email protected]>
Date: 2014-02-15T10:01:02Z
Remove cloneRecords which is no longer needed.
commit bdab5edd65140cd18c2dc29b00fa914d624dd999
Author: Yin Huai <[email protected]>
Date: 2014-02-15T10:03:28Z
Add a TODO for loading data into partitioned tables.
commit 35c9a8a11fed8ae8f7aa8d345b4bc0c53f413ab8
Author: Michael Armbrust <[email protected]>
Date: 2014-02-15T20:57:39Z
Merge pull request #46 from marmbrus/reviewFeedback
Address review feedback from previous PR.
commit 563bb22bd30b021e2bc276e2ed454f5296877a63
Author: Yin Huai <[email protected]>
Date: 2014-02-16T00:26:05Z
Set compression info in FileSinkDesc.
commit e08962779a195b991c2478647c65923f4ddd23b4
Author: Yin Huai <[email protected]>
Date: 2014-02-16T00:26:23Z
Code style.
commit 45ffb86df7c877c78de0470fbb66fae6be3bcf23
Author: Yin Huai <[email protected]>
Date: 2014-02-16T00:28:11Z
Merge remote-tracking branch 'upstream/master' into SerDeNew
commit eea75c522fbf9ead1ef4280e3420d3a6685b7a0c
Author: Yin Huai <[email protected]>
Date: 2014-02-16T11:24:15Z
Correctly set codec.
commit 428aff5f15a1954a983f049ade8986816d87e73c
Author: Yin Huai <[email protected]>
Date: 2014-02-16T12:39:24Z
Distinguish `INSERT INTO` and `INSERT OVERWRITE`.
commit a40d6d628384c172c1d1d7a4bd4011c3cb8f2b6b
Author: Yin Huai <[email protected]>
Date: 2014-02-16T14:09:23Z
Loading the static partition specified in a INSERT INTO/OVERWRITE query.
commit 334aacee2432fbc6c51644df08f4899d340a2ef4
Author: Yin Huai <[email protected]>
Date: 2014-02-16T14:11:45Z
New golden files.
commit d00260be188368ce943f2ffe7d087a7eff2f5f41
Author: Yin Huai <[email protected]>
Date: 2014-02-17T00:26:19Z
Strips backticks from partition keys.
commit 555fb1d1e965d19c6e7dc28027361868b3492c0f
Author: Yin Huai <[email protected]>
Date: 2014-02-17T06:51:16Z
Correctly set the extension for a text file.
commit feb022c1e77aac1f6b224cfc56bfd851762a0ca6
Author: Yin Huai <[email protected]>
Date: 2014-02-17T06:51:55Z
Partitioning key should be case insensitive.
commit a1a47760b718bfecc7e4b1adacb3a179f936825c
Author: Yin Huai <[email protected]>
Date: 2014-02-17T10:46:13Z
Update comments.
commit 017872cef3d771acab5fb3efc570dc1798e44f6d
Author: Yin Huai <[email protected]>
Date: 2014-02-17T10:46:31Z
Remove stats20 from whitelist.
commit 128a9f8b8082b3ed0659dfe6c41dbd7cbf04ff71
Author: Yin Huai <[email protected]>
Date: 2014-02-18T04:58:08Z
Minor changes.
commit f670c8c7adf6a3bc5c1e20850070b15e041f9285
Author: Yin Huai <[email protected]>
Date: 2014-02-18T09:35:01Z
Throw a NotImplementedError for not supported clauses in a CTAS query.
commit c5a4fabbe9a67c0bc3063314f7c5efd001aba52d
Author: Lian, Cheng <[email protected]>
Date: 2014-02-16T13:39:24Z
Merge branch 'master' into columnPruning
Conflicts:
shark/src/test/scala/org/apache/spark/sql/shark/execution/HiveQuerySuite.scala
shark/src/test/scala/org/apache/spark/sql/shark/execution/PartitionPruningSuite.scala
src/main/scala/catalyst/execution/FunctionRegistry.scala
src/main/scala/catalyst/execution/SharkInstance.scala
src/main/scala/catalyst/execution/planningStrategies.scala
commit 2682f72adde85870de6b7bc20e0df0622340cdb0
Author: Lian, Cheng <[email protected]>
Date: 2014-02-18T12:14:06Z
Merge remote-tracking branch 'origin/master' into columnPruning
commit 54f165b5f8814b9a9572f315b17505ef896b723a
Author: Lian, Cheng <[email protected]>
Date: 2014-02-18T12:19:26Z
Fixed spelling typo in two golden answer file names
commit cf4db596d1ef8edcaa4f5e42648ddc57e4dc38e6
Author: Lian, Cheng <[email protected]>
Date: 2014-02-18T16:32:20Z
Added golden answers for PruningSuite
commit f22df3aa73b75babca50ee0884bd064497bfe836
Author: Michael Armbrust <[email protected]>
Date: 2014-02-18T19:05:19Z
Merge pull request #37 from yhuai/SerDe
Support ORCSerDe
commit 9990ec7dcce26174f326172f1d662cc758d4e130
Author: Michael Armbrust <[email protected]>
Date: 2014-02-18T19:07:34Z
Merge pull request #28 from liancheng/columnPruning
Column pruning optimization together with some minor refactoring
commit 29effadbc188c5e6604a9e3a7460d9abde2c2fce
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:30:20Z
Include alias in attributes that are produced by overridden tables.
commit c9116a6aa873e88c6b72d6ddc5d935af7c083f15
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:31:16Z
Add combiner to avoid NPE when spark performs external aggregation.
commit 8c01c2475ef87d589263ba215f26530346b9868d
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:31:42Z
Move definition of Row out of execution to top level sql package.
commit 4905b2b0b5f5cc8c123b41ccbb2daec117f73fad
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:33:17Z
Add more efficient TopK that avoids global sort for logical Sort =>
StopAfter.
commit 532dd3748c262cdeea2f9f7977ba3a875e8b73fe
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:34:06Z
Allow the local warehouse path to be specified.
commit a4308954350a578dae8d8d4d49ac7ec52c2d0fe7
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:34:35Z
Planning for logical Repartition operators.
commit 5fe7de411c437d958d414d5530c56aceb6f6bfc3
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:36:09Z
Move table creation out of rule into a separate function.
commit b9225114460f9d628738b690fc0b33ba81a3c019
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:37:06Z
Fix insertion of nested types into hive tables.
commit 18a861b108eb20afa1a87ee04324de829478b4d2
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:38:06Z
Correctly convert nested products into nested rows when turning scala data
into catalyst data.
commit df88f01e1d449433e2f149dbaea90a9611848ff9
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:38:44Z
add a simple test for aggregation
commit 6e04e5b944113bc2c0cb528dcac1ccf3276109e2
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T21:39:14Z
Add insertIntoTable to the DSL.
commit 24eaa79764253a2771c980728037e17bbef17b50
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T22:22:06Z
fix > 100 chars
commit d393d2abebc03408fc43dbd835105134fa256463
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T22:41:37Z
Review Comments: Add comment to map that adds a sub query.
commit 2225431005040fd6bb0b71f125057b40ef8c0493
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T23:18:21Z
Merge pull request #48 from marmbrus/minorFixes
Several minor fixes for bugs found during benchmarking.
commit 3ac941623b9b9cc860de890a781578b21b3accae
Author: Michael Armbrust <[email protected]>
Date: 2014-02-25T00:24:39Z
Merge support for working with schema-ed RDDs using catalyst in as a spark
subproject.
commit f5e7492c267758c80b7ad3e4c74b3b20b34ec9e0
Author: Michael Armbrust <[email protected]>
Date: 2014-02-25T22:44:02Z
Add Apache license. Make naming more consistent.
commit 5f2963c053f39ef4298598be918a4758c1c32a13
Author: Michael Armbrust <[email protected]>
Date: 2014-02-27T23:20:05Z
naming and continuous compilation fixes.
commit 4d57d0e7b0e929d14c9d4218d5b63a03e176d04d
Author: Michael Armbrust <[email protected]>
Date: 2014-02-27T23:37:26Z
Fix test execution on travis.
commit 7413ac22622a991eac5fba33cbaeee2008f324f0
Author: Michael Armbrust <[email protected]>
Date: 2014-02-28T00:04:41Z
make test downloading quieter.
commit 608a29ea363e4093e605b2ecdcf3d55f4109e30d
Author: Michael Armbrust <[email protected]>
Date: 2014-02-28T02:22:58Z
Add hive as a repl dependency
commit c3343868f8cc8b1054513fe6619c9bb193e8816a
Author: Michael Armbrust <[email protected]>
Date: 2014-02-24T22:29:16Z
Initial support for generating schema's based on case classes.
commit b33e47ede48e9803fe213ec71d9a3ccea804b69a
Author: Andre Schumacher <[email protected]>
Date: 2014-02-16T14:09:02Z
First commit of Parquet import of primitive column types
commit 99a920916fa7f03669d86a9b9cf7482fedcaf318
Author: Andre Schumacher <[email protected]>
Date: 2014-02-16T17:54:44Z
Expanding ParquetQueryTests to cover all primitive types
commit eb0e521572c500e79de2dc5c3aa188b222490681
Author: Andre Schumacher <[email protected]>
Date: 2014-02-17T13:28:37Z
Fixing package names and other problems that came up after the rebase
commit 6ad05b34ecf9d457fd95c8e7f8f74ed979048cb9
Author: Andre Schumacher <[email protected]>
Date: 2014-02-19T11:06:53Z
Moving ParquetRelation to spark.sql core
commit a11e36428f3ea166825cbeb39ea23e86046dd26a
Author: Andre Schumacher <[email protected]>
Date: 2014-02-19T14:12:30Z
Adding Parquet RowWriteSupport
commit 0f17d7b6fcea76b991da1790cf39b97d5543eee1
Author: Andre Schumacher <[email protected]>
Date: 2014-02-19T14:26:55Z
Rewriting ParquetRelation tests with RowWriteSupport
commit 6a6bf9844e1c25e3f3360cc4c479f5db66e2bea7
Author: Andre Schumacher <[email protected]>
Date: 2014-02-19T16:31:40Z
Added column projections to ParquetTableScan
commit f347273cb9d8f6e6c43eb3ef5e54507025ecc1cd
Author: Andre Schumacher <[email protected]>
Date: 2014-02-20T17:01:37Z
Adding ParquetMetaData extraction, fixing schema projection
commit 75262eec5e21400011359dbf3f2825cbd7be461d
Author: Andre Schumacher <[email protected]>
Date: 2014-02-24T09:27:25Z
Integrating operations on Parquet files into SharkStrategies
commit 18fdc441ab3fc17535512f86cb77651d91596bdd
Author: Andre Schumacher <[email protected]>
Date: 2014-02-26T10:12:15Z
Reworking Parquet metadata in relation and adding CREATE TABLE AS for
Parquet tables
commit 3a0a552a5950f99f80bc178818103e393cfa775c
Author: Andre Schumacher <[email protected]>
Date: 2014-02-26T12:55:31Z
Reorganizing Parquet table operations
commit 332119573ba934e7fd8cb1f7adcd0d3bd791a1c2
Author: Andre Schumacher <[email protected]>
Date: 2014-02-27T07:41:21Z
Fixing one import in ParquetQueryTests.scala
commit 61e3bfbbb2fe4894fa5c2d7c27f1da6cec903819
Author: Andre Schumacher <[email protected]>
Date: 2014-03-02T11:45:59Z
Adding WriteToFile operator and rewriting ParquetQuerySuite
commit c863bed3d17abf9cd3da7cee8637d77b088a192d
Author: Andre Schumacher <[email protected]>
Date: 2014-03-02T14:28:23Z
Codestyle checks
commit 3ac9eb05d0cec3cca166503cb4dc417168694012
Author: Andre Schumacher <[email protected]>
Date: 2014-03-02T18:23:06Z
Rebasing to new main branch
commit 3bda72db9384b0f67cfbfbe22eb2674be113ceda
Author: Andre Schumacher <[email protected]>
Date: 2014-03-02T20:59:23Z
Adding license banner to new files
commit d7fbc3a591110dae76121c1095a32ab4788ae005
Author: Michael Armbrust <[email protected]>
Date: 2014-02-27T02:00:12Z
Several performance enhancements and simplifications of the expression
evaluation framework.
* Removed the Evaluate singleton in favor of placing expression evaluation
code in each expression.
* Instead of passing in a Seq of input rows we now take a single row. A
mutable JoinedRow wrapper can be used in the relatively rare cases where
expressions need to be evaluated on multiple input rows.
* GenericRow now takes a raw Array[Any] instead of a Seq. Since GenericRow
itself is a Seq wrapper, this avoids the creation of an unnecessary object.
* A new concept called MutableLiteral can be used to evaluate aggregate
expressions in-place, instead of needing to build new literal trees for each
update. This part is more of a WIP as we still incur boxing, however this is a
strict improvement over what was there before.
commit 296fe5036105b7e519501f58e0fb0204023c23f2
Author: Michael Armbrust <[email protected]>
Date: 2014-02-27T20:30:56Z
Address review feedback.
commit 6fdefe65478d950d3f30f6591df361558886d187
Author: Michael Armbrust <[email protected]>
Date: 2014-03-03T20:33:45Z
Port sbt improvements from master.
commit da9afbda89776602acb5dfa10d1c0a654f9d77dd
Author: Michael Armbrust <[email protected]>
Date: 2014-03-03T20:43:32Z
Add byte wrappers for hive UDFS.
commit 7b9d14263a4cbf5d39216c86a41b546c607b4a20
Author: Michael Armbrust <[email protected]>
Date: 2014-03-03T19:41:35Z
Update travis to increase permgen size.
commit 99e61fbfa386dc11f4b0df2134d8b714c57ad3ba
Author: Michael Armbrust <[email protected]>
Date: 2014-03-03T21:36:20Z
Merge pull request #51 from marmbrus/expressionEval
Several performance enhancements and simplifications of the expression
evaluation framework.
commit 8d5da5ed977b1c867b5b78f05523d89d5552b387
Author: Michael Armbrust <[email protected]>
Date: 2014-02-27T03:02:17Z
modify compute-classpath.sh to include datanucleus jars explicitly
commit 6d315bb168443eba98d978ae65c386ff27629bfc
Author: Cheng Lian <[email protected]>
Date: 2014-03-05T03:48:37Z
Added Row.unapplySeq to extract fields from a Row object.
commit 70e489d277470b5ed84d856af96b1167a0f892b6
Author: Cheng Lian <[email protected]>
Date: 2014-03-05T04:13:19Z
Fixed a spelling typo
commit 1ce01c7ad99d6c5d666c8b601c8f3527ab0ebe9f
Author: Michael Armbrust <[email protected]>
Date: 2014-03-05T08:59:26Z
Merge pull request #56 from liancheng/unapplySeqForRow
Added Row.unapplySeq to extract fields from a Row object.
commit 0040ae6d53e4298402b1ddcbcbcea6bc2b78e7d7
Author: Andre Schumacher <[email protected]>
Date: 2014-03-05T09:11:54Z
Feedback from code review
commit 9d419a632ace9064519b83f28d851dbd2707e99c
Author: Michael Armbrust <[email protected]>
Date: 2014-03-05T19:23:51Z
Merge remote-tracking branch 'catalyst/catalystIntegration' into
parquet_support
commit 7d0f13e9c8a2c336a2089affaad594943573577d
Author: Michael Armbrust <[email protected]>
Date: 2014-03-05T19:28:03Z
Update parquet support with master.
commit 3c3f9624a4c3041a0d8b68bc4e218ea6e0eef769
Author: Michael Armbrust <[email protected]>
Date: 2014-03-05T20:17:34Z
Fix a bug due to array reuse. This will need to be revisited after we
merge the mutable row PR.
commit c9f8fb3fbb6b45ede70c7b2e285668fdf1e48582
Author: Michael Armbrust <[email protected]>
Date: 2014-03-06T01:11:30Z
Merge pull request #53 from AndreSchumacher/parquet_support
Parquet support
commit d37139320dd35c91c22903a919aa177ae68e4cf7
Author: Michael Armbrust <[email protected]>
Date: 2014-03-05T02:54:21Z
Add a framework for dealing with mutable rows to reduce the number of
object allocations that occur in the critical path.
commit 959bdf0bb5362d6387e1748dd16b62f6abfe4801
Author: Michael Armbrust <[email protected]>
Date: 2014-03-06T02:05:25Z
Don't silently swallow all KryoExceptions, only the one that indicates the
end of a stream.
commit 9049cf0d432662cb40c7e31688049d9a1db6e732
Author: Michael Armbrust <[email protected]>
Date: 2014-03-06T02:06:53Z
Extend MutablePair interface to support easy syntax for in-place updates.
Also add a constructor so that it can be serialized out-of-the-box.
commit d9943336fda9c31fda202ed13e5c06b074214539
Author: Michael Armbrust <[email protected]>
Date: 2014-03-06T02:08:15Z
Remove copies before shuffle, this required changing the default shuffle
serialization.
commit ba28849fa9ec163dc39889cd7f3d683f28692b33
Author: Michael Armbrust <[email protected]>
Date: 2014-03-06T02:23:05Z
code review comments.
commit c2a658d1d18ee821d83b89de43992f444a0d5dbb
Author: Michael Armbrust <[email protected]>
Date: 2014-03-06T18:07:38Z
Merge pull request #55 from marmbrus/mutableRows
Add a framework for dealing with mutable rows.
commit 54637ecce8ea9a9af3b41ce4a7a719249bcff2f2
Author: Andre Schumacher <[email protected]>
Date: 2014-03-09T19:11:58Z
First part of second round of code review feedback
commit 5bacdc0e5c18bc6a4aee6bc2da8ac8d2a29751a0
Author: Andre Schumacher <[email protected]>
Date: 2014-03-09T20:35:39Z
Moving towards mutable rows inside ParquetRowSupport
commit 7ca4b4e34d466fd64243b80300fab28af09936e9
Author: Andre Schumacher <[email protected]>
Date: 2014-03-11T17:56:40Z
Improving checks in Parquet tests
commit aeaef544dda49dae87385f8bdd31e2a61719dfd2
Author: Andre Schumacher <[email protected]>
Date: 2014-03-11T18:33:00Z
Removing unnecessary Row copying and reverting some changes to MutableRow
commit 7386a9f386298d8428055cfae5784f78cac44ada
Author: Michael Armbrust <[email protected]>
Date: 2014-03-11T18:34:45Z
Initial example programs using spark sql.
commit f0ba39efd308339293b8cd4e397731f4b959ff65
Author: Michael Armbrust <[email protected]>
Date: 2014-03-11T18:54:52Z
Merge remote-tracking branch 'origin/master' into maven
Conflicts:
project/SparkBuild.scala
sbt/sbt-launch-lib.bash
commit 7233a7452fc36d3a9d7e7afcd560e9aad73bbf6c
Author: Michael Armbrust <[email protected]>
Date: 2014-03-11T22:19:08Z
initial support for maven builds
commit 3447c3edb7a83163a5668c68a246bc04216a0e71
Author: Michael Armbrust <[email protected]>
Date: 2014-03-13T19:15:50Z
Don't override the metastore / warehouse in non-local/test hive context.
commit 3386e4fd6715c133c5fb04e7b5b3d59af4b2ae53
Author: Michael Armbrust <[email protected]>
Date: 2014-03-13T19:32:06Z
Merge pull request #58 from AndreSchumacher/parquet_fixes
Parquet fixes
commit 1a4bbd9f2b471e67d99cfa3e9a62406ed1b29723
Author: Michael Armbrust <[email protected]>
Date: 2014-03-13T20:51:55Z
Merge pull request #60 from marmbrus/maven
Basic support for maven, update spark.
commit f93aa39fdd3cabc3377c92bc650a6f23469c3291
Author: Andre Schumacher <[email protected]>
Date: 2014-03-14T16:25:21Z
Better handling of path names in ParquetRelation
Previously incomplete path names (with missing URI field) were passed
to Parquet. Also two rules were moved from HiveStrategies to
SparkStrategies.
commit 5d710747a2f334755bf8a72ff841e42d9344299b
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T16:59:46Z
Merge pull request #62 from AndreSchumacher/parquet_file_fixes
Better handling of path names in ParquetRelation
commit 8b35e0ac28080a4470d7e7eb6d0d3145de12d4e2
Author: Michael Armbrust <[email protected]>
Date: 2014-03-13T20:53:54Z
address feedback, work on DSL
commit d2d9678a63ffa61d5a2abd37bb667371ce8641ba
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T02:08:27Z
Make sure hive isn't in the assembly jar. Create a separate, optional Hive
assembly that is used when present.
commit 9eb029405a8ba39fe7b40736702ce1443b9b149c
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T02:50:43Z
Bring expressions implicits into SqlContext.
commit f7d992db7ba126455069f48ce3fef2f95544095d
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T05:48:59Z
Naming / spelling.
commit ce8073b32d5a8713c5ad494baa1026c103e2882d
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T06:25:59Z
clean up implicits.
commit 2f224546a0c3e0713de359727e92d727bd41091e
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T06:26:15Z
WIP: Parquet example.
commit c01470fa14e75fbbea72b0c244515d1f2cdb26cb
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T17:07:50Z
Clean up example
commit 013f62a2eb59e76510d06d6e8b2ab6a882bdb598
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T17:31:34Z
Fix documentation / code style.
commit c2efad69d2013c4a8557874b9b1260ea7ae8dafc
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T20:14:01Z
First draft of SQL documentation.
commit e5e1d6bc80ce4faf4965b140c931ec1c277874bd
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T20:14:24Z
Remove travis configuration.
commit 1d0eb63b2a0f0cee2924287c583e1c62a9a83784
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T20:28:40Z
update changes with spark core
commit 6978dd8ed0b242103bb4af4c6c7c031d960b1285
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T21:03:34Z
update docs, add apache license
commit 9dffbfa855128e31b3bed95fa9deec8fea85710a
Author: Michael Armbrust <[email protected]>
Date: 2014-03-14T21:51:25Z
Style fixes. Add downloading of test cases to jenkins.
commit adcf1a46fe02dbc3b32c8997ebf50af0e5ff1555
Author: Henry Cook <[email protected]>
Date: 2014-03-14T23:14:10Z
Update sql-programming-guide.md
Minor typos
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---