I think Nick brings up some good points.  Would there ever be a reason to
not use UTF8 as the default from parsing a message on?  All the tools we
use for analytics work with UTF8 (am I wrong?).

The only case I can see needing a configurable charset would be if a
message coming from a sensor were encoded in a charset other than UTF8.  In
that case there would need to be a configurable charset per parser (in
parser config?) for decoding but the message could be encoded/decoded with
UTF8 after that.  Or we could just make UTF encoding a prerequisite for
sending messages to Metron.

On Fri, Apr 21, 2017 at 10:32 AM, Nick Allen <n...@nickallen.org> wrote:

> Per (2), I think it makes sense to make the charset configurable, but with
> the proposal of 3 separate settings, wouldn't things blow up horribly if
> the Parsers are producing UTF-8, but Enrichment is expecting UTF-16?  They
> are not even speaking the same language, no?
>
> This makes me think that we need to understand the scenarios under which a
> user would want to change the charset, before we know what kinds of levers
> to bake-in.  What sort of use case would drive someone to change the
> charset?  Would there be a particular sensor producing telemetry with a
> different charset from others?  If one source of telemetry is different
> than the others, would the entire system even work with varying charsets?
>
> Without a good understanding of use cases, I think the only mildly safe
> thing to do right now, is to have a single, global charset setting.  The
> user would have to ensure that their entire environment and all the JVMs
> driving it are set to use that charset.
>
> Perhaps my questioning comes from a lack of understanding of charsets.  I
> can't remember ever having to deviate from the default.  Please chime in
> and educate me, if I am off base.
>
>
>
>
>
>
> On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jjmey...@gmail.com> wrote:
>
> > Hello everybody,
> >
> > Currently our build has a significant amount of warnings (most from the
> > error prone plugin). A lot of this has been documented here:
> > https://issues.apache.org/jira/browse/METRON-617
> >
> > I want to continue to work on this Jira. I really want to reduce the
> number
> > of warnings in our build. As the Jira points out, a lot of them are
> > unchecked casts or the implicit use of default charset.
> >
> > When starting this, I want to start with a specific module. I plan on
> > starting with `metron-interface` as it's fairly self contained and is
> > pretty new. Below I want to layout what I plan on doing. Please let me
> know
> > what you all think:
> >
> > 1. Suppress warnings where generics are not supported or checking type is
> > not possible.
> > 2. For all unit tests that require supplying a charset I'll supply the
> > UTF-8 charset.
> > 3. Update the API to have a charset configuration for each resource that
> > needs one.
> > 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out error
> > prone doesn't support this level of suppression. Which is probably a good
> > thing. We will need to explicitly state what we want to suppress instead.
> >
> > Two big questions came up right away when I was thinking about the above:
> >
> > 1. When suppressing warnings. I see we sometimes cast a JSONObject to
> > Map<String, Object>. I know it extends Map, but is it really safe to do
> > something like the following? Should we have a utility that truly builds
> a
> > map that uses generics? I plan on doing some testing around this, but if
> > anyone has any experience with this it would be greatly appreciated. Here
> > is an example of what I am talking about:
> >
> > JSONObject message = ...;
> > @SuppressWarnings("unchecked")
> > Map<String, Object> state = (Map<String, Object>) message;
> >
> >
> > 2. This one is about configuring charset (#3 above). Specifically in
> > metron-rest. Right now, I believe there are 3 sensor resources (index,
> > enrichment, and parser) that throw warnings due to implicitly using the
> > default charset. I propose that we have 3 configs (listed below). These
> > configs would take any valid charset, default, or not set. If not set
> then
> > UTF-8 would be default. Does this seem fair?
> >
> > sensor:
> >   index.encoding: UTF-8
> >   enrichment.encoding: UTF-8
> >   parser.encoding: UTF-8
> >
> >
> > Does anyone see any problems that may occur if we go about it this way?
> Any
> > help on this would be very much appreciated.
> >
> > Thanks,
> > JJ
> >
>

Reply via email to