Todd Lipcon has posted comments on this change.

Change subject: - moved the kudu slume sink code from the flume repo to the 
kudu repo - applied Mike Percy's notes
......................................................................


Patch Set 1:

(19 comments)

Looks like a good start. I left a bunch of comments, but mostly small/stylistic 
stuff.

It would also be great to have a simple test here. Feel free to drop by our 
Slack channel or mailing list if you need any pointers on where to start with a 
test.

http://gerrit.cloudera.org:8080/#/c/2383/1//COMMIT_MSG
Commit Message:

Line 7: - moved the kudu slume sink code from the flume repo to the kudu repo
nit: can you format the commit message with a one-line summary, followed by a 
blank line, followed by the longer description? 
http://chris.beams.io/posts/git-commit/ has a good explanation of what you 
might want to say. In particular, the commit message should be a record of 
what's being added, not the diff since the previous revision (since looking 
back on this next year we probably won't remember that Mike made any notes)

nit: typo "flume" not "slume"


http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/pom.xml
File java/kudu-flume-sink/pom.xml:

Line 30:         <groupId>org.apache.rat</groupId>
do we need the rat plugin here in the subproject? I would have thought it would 
be defined only at the top level pom


Line 45:       <artifactId>flume-ng-core</artifactId>
hm, I'm surprised that we have a dependency on 'core' and not just the 'sdk'. 
But, I dont know anything about Flume. Mike?


http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduEventSerializer.java
File 
java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduEventSerializer.java:

Line 31:  * of an event to write them to Kudu. This is configurable, so any 
config
"serializer" is a bit of a strange term for this... we're not actually 
serializing (converting to a byte stream) but rather transforming a flume event 
into a set of kudu operations. Maybe KuduEventTransformer or KuduRowProducer or 
EventToKuduConverter or something?


Line 39:    * @param table
missing description


Line 46:    * @return List of {@link org.kududb.client.PartialRow} which
this seems to contradict the actual return type


http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduSink.java
File java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduSink.java:

Line 48:  *
nit: extra line


Line 51:  * <tt>table: </tt> The name of the table in Hbase to write to. <p>
should say kudu, not hbase


Line 84:     Preconditions.checkArgument(table == null || client == null || 
session == null, "Please call stop " +
should be checkState


Line 99:           " from Kudu", e);
nit: do something like:

  String msg = "Could not open table '" + tableName + "' from Kudu";

to avoid duplication. Also please quote the table name as above


Line 133:     //If not specified, will use HBase defaults.
- Kudu, not hbase. Not sure what this is referring to (what's the kudu 
default?) Do you mean 'use the default SimpleKuduEventSerializer'?
- nit: space after '// '


Line 141:     if(eventSerializerType == null || eventSerializerType.isEmpty()) {
style: space before (


Line 170:       //session object.
nit: add spaces after //


Line 177:     long i = 0;
can you define this inside the try block? it doesn't seem to be used outside of 
it.


Line 194:           sinkCounter.addToEventDrainAttemptCount(1);
seems odd to be incrementing the 'event drain attempt count' here (once per 
input event), but the 'event drain success count' down below (once per flush). 
Should this and the 'batch complete count' counter increment be down outside 
the for loop perhaps?


Line 231:             "Transaction rolled back.", e);
this string is repeated 4 times - can you extract a constant?


http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduSinkConfigurationConstants.java
File 
java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduSinkConfigurationConstants.java:

Line 28:   public static final String CONFIG_MASTER_HOST = "master";
given that we have some support for multi-master (even though it's experimental 
right now) we should probably name this 'masters'
- MASTER_ADDRESS is better than MASTER_HOST, because it can contain an optional 
port number


http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/SimpleKuduEventSerializer.java
File 
java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/SimpleKuduEventSerializer.java:

Line 40:  * pCol will be assumed.<p>
'pCol' is a kind of odd default. Maybe 'payload' would be better? You should 
also specify the type of column that this writes to (BINARY)


Line 73:       operations.add(insert);
why not just do 'return Collections.singletonList(insert);' here? This results 
in one fewer object allocation (singletonList just has a single non-array 
member, whereas using arraylist causes the extra allocation for the T[] array 
inside)


-- 
To view, visit http://gerrit.cloudera.org:8080/2383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9c9897a0645e0bf43606e3bd07d648eeafd4224
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Ara Ebrahimi <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to