[
https://issues.apache.org/jira/browse/PHOENIX-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124311#comment-14124311
]
James Taylor commented on PHOENIX-1241:
---------------------------------------
My proposal is that we provide a simple mechanism where we look for the
annotation key as a column name in the Phoenix tracing table. If we find a
match, then that determines the type in which the data is serialized into the
trace table (since a column always has a type associated with it). This is
optional in that if the annotation key is *not found*, then we store it as
VARBINARY without doing any type conversion.
Since the Phoenix tracing table may be overridden by the user, this provides
enough flexibility - they could always provide a tracing table without any of
these type mappings in which case the value serialized to the sink would just
be passed through as a VARBINARY value.
For the Phoenix specific annotations, we know the type since we're the ones
producing it, so we might as well add these as columns to the default tracing
table. This is not a requirement, though. If an annotation key is not found as
a column it'll always fallback to just storing it as a VARBINARY.
I don't think we need to worry about the case where the same annotation key has
different serialization formats. This will handle Eli's case as well - users
that add user-specified annotations can add them to their trace table as well,
and we can add some default ones for tenant_id, for example.
Here's an example, where say there is a
phoenix.trace.annotations.queryDuration. I don't know what an actual dynamic
column will be in the annotations column family based on what we're tracing,
but hopefully this example makes sense. In this case, you'd create we could
declare the tracing table like this instead (or a user could do this as well by
defining their own tracing table name):
{code}
CREATE TABLE SYSTEM.TRACING_STATS (
trace_id BIGINT NOT NULL,
parent_id BIGINT NOT NULL,
span_id BIGINT NOT NULL,
description VARCHAR,
start_time BIGINT,
end_time BIGINT,
hostname VARCHAR,
tags.count SMALLINT,
annotations.count SMALLINT,
annotations."phoenix.trace.annotations.queryDuration" LONG,
CONSTRAINT pk PRIMARY KEY (trace_id, parent_id, span_id)
{code}
So it's just a simple, general way for users to _declare_ the type of a
particular annotation.
If the way the annotation is serialized into the metrics sink in different
ways, for example if for some HBase metrics the annotation is serialized as a
long while other annotations are serialized as a string, then we'll need to
think about how best to handle this. My proposal above is only talking about
the *destination* type of the annotation as it's stored in the Phoenix tracing
table. It's not talking about the case where the *source* type of annotations
varies in how it's serialized (i.e. and subsequently how the
PhoenixTableMetricsWriter will interpret the data when it *reads* it prior to
writing it to the Phoenix table). Is there a variation here, or is everything a
string when the PhoenixTableMetricsWriter sees it?
If there is variation, then we can still use the above mechanism where the
column that's declared determines both how the value is interpreted when it's
read as well as how it's written to the Phoenix tracing table.
> Add typing to trace annotations
> -------------------------------
>
> Key: PHOENIX-1241
> URL: https://issues.apache.org/jira/browse/PHOENIX-1241
> Project: Phoenix
> Issue Type: Sub-task
> Affects Versions: 5.0.0, 4.1.1
> Reporter: Jesse Yates
> Fix For: 5.0.0, 4.1.1
>
>
> Currently traces only support storing string valued annotations - this works
> for known trace sources. However, phoenix will have trace annotations with
> specific types. We can improve the storage format to know about these custom
> types, rather than just storing strings, making the query interface more
> powerful.
> See PHOENIX-1226 for more discussion
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)