[ https://issues.apache.org/jira/browse/CRUNCH-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282636#comment-16282636 ]
Micah Whitacre commented on CRUNCH-340: --------------------------------------- Some comments: * It'd be nice to redo the test setup to follow what crunch-core does to avoid you having to deal with custom failsafe/surefire config in the project. (e.g. make the suite end with "IT.java" and rename the individual tests in that suite to something that doesn't match on either failsafe or surefire (e.g. "spec"?) * In your tests you call p.done() after a bunch of asserts. This can lead to some stranded pipelines/intermediate files if any test fails. try to ensure calling pipeline.done() * In HCatTestSuite, you have a variable named "hbaseTmpDir" but nothing in that class actually deals with HBase. Should just be hadoopTmpDir. * In HCatTestSuite, "databaseLocation" make that a temporary folder to ensure it doesn't exist under the project to accidentally be checked in/linger between runs and also the JUnit TemporaryFolder will ensure the data will be cleaned up at the end of the suite. It'll also assume less about the launching/running location. Also then in your cleanup you can do the cleanup using the folder vs a specific file. Also why cleanup using Hadoop FileSystem vs Java File objects? * Javadoc for FromHCat seems to indicate the default db is "database" when it should be "default" * Can likely remove the site.xml from the project. It is not the right file but also looks like that only exists in certain projects. Other than that +1 from me. > Create HCatSource and HCatTarget > -------------------------------- > > Key: CRUNCH-340 > URL: https://issues.apache.org/jira/browse/CRUNCH-340 > Project: Crunch > Issue Type: New Feature > Reporter: Chao Shi > Assignee: Stephen Durfey > Attachments: 0001-CRUNCH-340-added-HCatSource-HCatTarget.patch, > CRUNCH-340.patch, crunch-340-v2.patch, crunch-340-v3.patch, > crunch-340-v4.patch, crunch-340.patch > > > This patch adds HCatSource, which enables crunch pipeline to read from Hive > tables. This is the very first version, leaving a few TODOs in code. > It adds new dependency from crunch-core to hcatalog (as well as several hive > components). I guess maybe we should create a new subproject (e.g. > crunch-hcatalog) rather than add it into crunch-core. -- This message was sent by Atlassian JIRA (v6.4.14#64029)