[
https://issues.apache.org/jira/browse/HTRACE-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333197#comment-15333197
]
Colin Patrick McCabe commented on HTRACE-362:
---------------------------------------------
Thanks for posting this, [~nisala12]! Very encouraging!
It's more traditional to attach a patch to the JIRA than to use a reference to
an external site. We can certainly do either one, but it makes it hard to
refer to specific versions of patches when you just link to a branch which
changes over time. I attached the code that is there now to the JIRA as
HTRACE-362.001.patch to make it easier for others to review this specific
version. One tip is that in github, you can get the patch version of a branch
by doing something like this:
{{wget
https://github.com/nisalanirmana/incubator-htrace/commit/6752709e6d078465f55f88233ec3357328aa6d1e.patch}}
(notice the ".patch" at the end).
{code}
+ <parent>
+ <artifactId>htrace</artifactId>
+ <groupId>org.apache.htrace</groupId>
+ <version>4.1.0-incubating-SNAPSHOT</version>
+ <relativePath>..</relativePath>
+ </parent>
{code}
Version 4.1 of HTrace has already been released. This should be version
4.2.0-incubating-SNAPSHOT.
{code}
+ <properties>
+ <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+ <kudu.version>1.0.0-SNAPSHOT</kudu.version>
+ <hadoop.version>2.6.0-cdh5.4.7</hadoop.version>
+ <createDependencyReducedPom>true</createDependencyReducedPom>
+ </properties>
{code}
It's good that you are using UTF8. However, a few questions:
* I don't see why Hadoop is needed here? Kudu is a separate project from
Hadoop; it's client doesn't depend on Hadoop as far as I know.
* Has Kudu published a non-snapshot version of its client library? We
generally should not be depending on SNAPSHOT versions of things in Maven if we
can help it, since it makes for non-repeatable builds.
{code}
+ <repositories>
+ <repository>
+ <id>cdh.repo</id>
+ <name>Cloudera Repositories</name>
+ <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
+ <snapshots>
+ <enabled>false</enabled>
+ </snapshots>
+ </repository>
+ </repositories>
{code}
Shouldn't Kudu be available from Apache repos at this point? We generally do
not include vendor repos in upstream projects.
I can see that you are manually serializing the span fields to protobuf and
then storing it in Kudu. However, this approach discards all the benefits of
Kudu's knowledge about types and columnar in-memory data optimizations.
Instead of storing everything as a binary blob, you should store it by setting
up an appropriate schema in Kudu. Another benefit is that by using Kudu's own
type system, you will also not need to drag in all the protobuf dependencies.
{{KuduHTraceConfiguration.java}}: I don't see any point to this file. The way
we configure HTrace is determined by the system we're integrating with, not the
span receiver we are using. Span Receivers are very specifically isolated from
configuration details.
{{KuduClientConfiguration.java}}: why is this class mutable? These fields
should be {{final}} since we don't plan on changing them after the
{{SpanReceiver}} is constructed. Otherwise we have to worry about what could
happen if someone changes these fields later.
{code}
+ public KuduClient buildClient() {
+ KuduClientBuilder builder = new KuduClient
+ .KuduClientBuilder(host.concat(":").concat(port));
+ if (Integer.valueOf(workerCount) != null) {
+ builder.workerCount(workerCount);
+ }
+ if (Integer.valueOf(bossCount) != null) {
+ builder.bossCount(bossCount);
+ }
+ if (!Boolean.valueOf(isStatisticsEnabled)) {
+ builder.disableStatistics();
+ }
+ if (Long.valueOf(adminOperationTimeout) != null) {
+ builder.defaultAdminOperationTimeoutMs(adminOperationTimeout);
+ }
+ if (Long.valueOf(operationTimeout) != null) {
+ builder.defaultOperationTimeoutMs(operationTimeout);
+ }
+ if (Long.valueOf(socketReadTimeout) != null) {
{code}
{{valueOf}} a primitive integer or long will never be {{null}}, so these checks
don't accomplish anything. It seems like perhaps the intention here was to
avoid explicitly setting these quantities unless they were explicitly
configured, which is a nice goal. It would allow Kudu's own defaults to be
honored when the user had no preferences. However, the code is not
accomplishing this at the moment. Suggest storing the elements as boxed
integers and longs if it is important that they be nullable.
{{KuduConstants.java}}: I don't know if "KuduCinstants" is a good name for this
file since that suggests something internal to kudu, not to our span receiver.
{code}
+ public static final String KUDU_MASTER_HOST_KEY = "htrace.kudu.master.host";
+ public static final String DEFAULT_KUDU_MASTER_HOST = "127.0.0.1";
...
{code}
One minor thing here: The typical pattern is {{KUDU_MASTER_HOST_KEY}} /
{{KUDU_MASTER_HOST_KEY_DEFAULT}}
One not-so-minor thing: configuration keys accepted by span receivers should
not start with "htrace." The configuration objects used by end-users (such as
the Hadoop one) strip off prefixes like "htrace." before passing them to the
span receiver.
> Apache KUDU Span receiver implementation for Apache HTrace
> ----------------------------------------------------------
>
> Key: HTRACE-362
> URL: https://issues.apache.org/jira/browse/HTRACE-362
> Project: HTrace
> Issue Type: Bug
> Reporter: Nisala Mendis
> Assignee: Nisala Mendis
> Attachments: HTRACE-362.001.patch, kudu_span_receiver_basic_design.pdf
>
>
> Implementation should be carried two components as span receiver that writes
> kudu back end and span viewer to view written span/traces.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)