[ 
https://issues.apache.org/jira/browse/CRUNCH-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282636#comment-16282636
 ] 

Micah Whitacre commented on CRUNCH-340:
---------------------------------------

Some comments:
* It'd be nice to redo the test setup to follow what crunch-core does to avoid 
you having to deal with custom failsafe/surefire config in the project.  (e.g. 
make the suite end with "IT.java" and rename the individual tests in that suite 
to something that doesn't match on either failsafe or surefire (e.g. "spec"?)
* In your tests you call p.done() after a bunch of asserts.  This can lead to 
some stranded pipelines/intermediate files if any test fails.  try to ensure 
calling pipeline.done()
* In HCatTestSuite, you have a variable named "hbaseTmpDir" but nothing in that 
class actually deals with HBase.  Should just be hadoopTmpDir.
* In HCatTestSuite, "databaseLocation" make that a temporary folder to ensure 
it doesn't exist under the project to accidentally be checked in/linger between 
runs and also the JUnit TemporaryFolder will ensure the data will be cleaned up 
at the end of the suite.  It'll also assume less about the launching/running 
location.  Also then in your cleanup you can do the cleanup using the folder vs 
a specific file.  Also why cleanup using Hadoop FileSystem vs Java File objects?
* Javadoc for  FromHCat seems to indicate the default db is "database" when it 
should be "default"
* Can likely remove the site.xml from the project.  It is not the right file 
but also looks like that only exists in certain projects.


Other than that +1 from me.

> Create HCatSource and HCatTarget
> --------------------------------
>
>                 Key: CRUNCH-340
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-340
>             Project: Crunch
>          Issue Type: New Feature
>            Reporter: Chao Shi
>            Assignee: Stephen Durfey
>         Attachments: 0001-CRUNCH-340-added-HCatSource-HCatTarget.patch, 
> CRUNCH-340.patch, crunch-340-v2.patch, crunch-340-v3.patch, 
> crunch-340-v4.patch, crunch-340.patch
>
>
> This patch adds HCatSource, which enables crunch pipeline to read from Hive 
> tables. This is the very first version, leaving a few TODOs in code.
> It adds new dependency from crunch-core to hcatalog (as well as several hive 
> components). I guess maybe we should create a new subproject (e.g. 
> crunch-hcatalog) rather than add it into crunch-core.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to