[ 
https://issues.apache.org/jira/browse/HTRACE-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333197#comment-15333197
 ] 

Colin Patrick McCabe commented on HTRACE-362:
---------------------------------------------

Thanks for posting this, [~nisala12]!  Very encouraging!

It's more traditional to attach a patch to the JIRA than to use a reference to 
an external site.  We can certainly do either one, but it makes it hard to 
refer to specific versions of patches when you just link to a branch which 
changes over time.  I attached the code that is there now to the JIRA as 
HTRACE-362.001.patch to make it easier for others to review this specific 
version.  One tip is that in github, you can get the patch version of a branch 
by doing something like this:
{{wget 
https://github.com/nisalanirmana/incubator-htrace/commit/6752709e6d078465f55f88233ec3357328aa6d1e.patch}}
 (notice the ".patch" at the end).

{code}
+  <parent>
+    <artifactId>htrace</artifactId>
+    <groupId>org.apache.htrace</groupId>
+    <version>4.1.0-incubating-SNAPSHOT</version>
+    <relativePath>..</relativePath>
+  </parent>
{code}
Version 4.1 of HTrace has already been released.  This should be version 
4.2.0-incubating-SNAPSHOT.

{code}
+  <properties>
+    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+    <kudu.version>1.0.0-SNAPSHOT</kudu.version>
+    <hadoop.version>2.6.0-cdh5.4.7</hadoop.version>
+    <createDependencyReducedPom>true</createDependencyReducedPom>
+  </properties>
{code}
It's good that you are using UTF8.  However, a few questions:
* I don't see why Hadoop is needed here?  Kudu is a separate project from 
Hadoop; it's client doesn't depend on Hadoop as far as I know.
* Has Kudu published a non-snapshot version of its client library?  We 
generally should not be depending on SNAPSHOT versions of things in Maven if we 
can help it, since it makes for non-repeatable builds.

{code}
+  <repositories>
+    <repository>
+      <id>cdh.repo</id>
+      <name>Cloudera Repositories</name>
+      <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
+      <snapshots>
+        <enabled>false</enabled>
+      </snapshots>
+    </repository>
+  </repositories>
{code}
Shouldn't Kudu be available from Apache repos at this point?  We generally do 
not include vendor repos in upstream projects.

I can see that you are manually serializing the span fields to protobuf and 
then storing it in Kudu.  However, this approach discards all the benefits of 
Kudu's knowledge about types and columnar in-memory data optimizations.  
Instead of storing everything as a binary blob, you should store it by setting 
up an appropriate schema in Kudu.  Another benefit is that by using Kudu's own 
type system, you will also not need to drag in all the protobuf dependencies.

{{KuduHTraceConfiguration.java}}: I don't see any point to this file.  The way 
we configure HTrace is determined by the system we're integrating with, not the 
span receiver we are using.  Span Receivers are very specifically isolated from 
configuration details.

{{KuduClientConfiguration.java}}: why is this class mutable?  These fields 
should be {{final}} since we don't plan on changing them after the 
{{SpanReceiver}} is constructed.  Otherwise we have to worry about what could 
happen if someone changes these fields later.

{code}
+  public KuduClient buildClient() {
+    KuduClientBuilder builder = new KuduClient
+            .KuduClientBuilder(host.concat(":").concat(port));
+    if (Integer.valueOf(workerCount) != null) {
+      builder.workerCount(workerCount);
+    }
+    if (Integer.valueOf(bossCount) != null) {
+      builder.bossCount(bossCount);
+    }
+    if (!Boolean.valueOf(isStatisticsEnabled)) {
+      builder.disableStatistics();
+    }
+    if (Long.valueOf(adminOperationTimeout) != null) {
+      builder.defaultAdminOperationTimeoutMs(adminOperationTimeout);
+    }
+    if (Long.valueOf(operationTimeout) != null) {
+      builder.defaultOperationTimeoutMs(operationTimeout);
+    }
+    if (Long.valueOf(socketReadTimeout) != null) {
{code}
{{valueOf}} a primitive integer or long will never be {{null}}, so these checks 
don't accomplish anything.  It seems like perhaps the intention here was to 
avoid explicitly setting these quantities unless they were explicitly 
configured, which is a nice goal.  It would allow Kudu's own defaults to be 
honored when the user had no preferences.  However, the code is not 
accomplishing this at the moment.  Suggest storing the elements as boxed 
integers and longs if it is important that they be nullable.

{{KuduConstants.java}}: I don't know if "KuduCinstants" is a good name for this 
file since that suggests something internal to kudu, not to our span receiver.
{code}
+  public static final String KUDU_MASTER_HOST_KEY = "htrace.kudu.master.host";
+  public static final String DEFAULT_KUDU_MASTER_HOST = "127.0.0.1";
...
{code}
One minor thing here: The typical pattern is {{KUDU_MASTER_HOST_KEY}} / 
{{KUDU_MASTER_HOST_KEY_DEFAULT}}

One not-so-minor thing: configuration keys accepted by span receivers should 
not start with "htrace."  The configuration objects used by end-users (such as 
the Hadoop one) strip off prefixes like "htrace." before passing them to the 
span receiver.

> Apache KUDU Span receiver implementation for Apache HTrace
> ----------------------------------------------------------
>
>                 Key: HTRACE-362
>                 URL: https://issues.apache.org/jira/browse/HTRACE-362
>             Project: HTrace
>          Issue Type: Bug
>            Reporter: Nisala Mendis
>            Assignee: Nisala Mendis
>         Attachments: HTRACE-362.001.patch, kudu_span_receiver_basic_design.pdf
>
>
> Implementation should be carried two components as span receiver that writes 
> kudu back end and span viewer to view written span/traces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to