One nit:  This has nothing to do with any dataset definition, on the parser
side of things - it's the type parameter on the create feed DDL statement
that should be the parser's guide.  (In general the optional function on
the feed may change the type by the time the data reaches a dataset.)
On Apr 30, 2016 3:26 PM, "Xikui Wang" <xik...@uci.edu> wrote:

> Hi Abdullah,
>
> Actually I also have the concern that adding null-check for general cases
> will bring extra
> overheads. Thus I plan to add the checking procedure after parser, but
> before addTuple,
> i.e.FeedRecordDataFlowController. But based on what I have seen so far, it
> seems RecordType
> is transparent to FeedRecordDataFlowController. So I am still investigating
> that...
>
> I saw the null check in ADM parser. That's actually a viable way to handle
> that within the
> parser scope. But I am looking for a slightly different solution. In my
> perspective,
> ADM parser assumes the input adm should conform with the dataset
> definition.
> Thus it's reasonable for it to throw a exception. For Tweetparser, if I saw
> null value on non-null attribute, I will
> discard the whole tweet directly, and may not even log it(as too many
> tweets with null).
> That's the reason why I want to put that in FeedRecordDataFlowController,
> since I didn't see
> there is a good way to prevent record insert in parser except for throw
> exception.
>
> Not sure my opinion makes sense or not. Feel free to comment. :)
>
> Best,
> Xikui
>
> On Sat, Apr 30, 2016 at 1:52 PM, abdullah alamoudi <bamou...@gmail.com>
> wrote:
>
> > Adding a few points here:
> >
> > My feeling is SerializerDeserializer offers another level of abstraction
> > but with output I can write value directly without construct AType
> object.
> > I am wondering if there are any preferences over these two?
> >
> > - Using The SerializerDeserializer option, you will only create a single
> > object regardless of the number of parsed records, so I wouldn't worry
> > about it. Code maintainability takes precedence here IMO.
> > - In addition to records and lists, UTF8StringSerializerDeserializer can
> be
> > stateful for the same reason (avoid creating lost of un-needed objects).
> In
> > fact, our parsers use the stateful UTF8StringSerializerDeserializer
> since I
> > noticed that using the stateless one creates lots of byte[] and triggers
> GC
> > over and over.
> > - Right now, we parse missing values as null. Should that change?
> > - There is definitely a check for nulls on non-nullable values at least
> in
> > the ADM parser. There might be a bug however that makes it accept
> explicit
> > null values and that should be fixed.
> >
> > I am for NOT using the cast record solution for the overhead it will add.
> > but that is just me :)
> > ~Abdullah.
> >
> >
> > On Sat, Apr 30, 2016 at 6:48 AM, Xikui Wang <xik...@uci.edu> wrote:
> >
> > > Thank you Yingyi. I will try to figure out a solution from that
> > direction.
> > >
> > > Best,
> > > Xikui
> > >
> > > On Fri, Apr 29, 2016 at 3:48 PM, Yingyi Bu <buyin...@gmail.com> wrote:
> > >
> > > > Yeah, I think so:-)
> > > >
> > > > Best,
> > > > Yingyi
> > > >
> > > > On Fri, Apr 29, 2016 at 3:46 PM, Mike Carey <dtab...@gmail.com>
> wrote:
> > > >
> > > > > This indeed might be cleaner?
> > > > >
> > > > >
> > > > > On 4/29/16 3:28 PM, Yingyi Bu wrote:
> > > > >
> > > > >> I'm guessing that you can do similar things to
> CastRecordDescriptor
> > > > >>>> if you want to handle general cases in that region.
> > > > >>>>
> > > > >>> Or, you can inject a cast-record function in the loading pipeline
> > > > >> so that you can defer the runtime-type-check/cast to that function
> > > > instead
> > > > >> of doing that in the parser.
> > > > >>
> > > > >>
> > > > >> On Fri, Apr 29, 2016 at 3:25 PM, Yingyi Bu <buyin...@gmail.com>
> > > wrote:
> > > > >>
> > > > >> My answer is inlined.
> > > > >>>
> > > > >>> My feeling is SerializerDeserializer offers another level of
> > > > abstraction
> > > > >>>>> but with output I can write value directly without construct
> > AType
> > > > >>>>>
> > > > >>>> object.
> > > > >>>
> > > > >>>> I am wondering if there are any preferences over these two?
> > > > >>>>>
> > > > >>>> I agree with you. However, a SerializerDeserializer has to be
> > > > stateless,
> > > > >>> hence it cannot be used at runtime for complex type objects such
> as
> > > > >>> records and lists,
> > > > >>> because it will create a lot Java objects.
> > > > >>>
> > > > >>> in other words, parser has to guarantee that the
> > > > >>>>> processed records has to match the dataset
> > definition(non-optional
> > > > >>>>> attribute cannot have null value). I tried to assign null value
> > to
> > > > >>>>>
> > > > >>>> non-null
> > > > >>>
> > > > >>>> attributes. It will be inserted successfully but read records
> will
> > > > have
> > > > >>>>> problem.
> > > > >>>>>
> > > > >>>> That sounds right to me.  Please file a JIRA issue and assign to
> > > you (
> > > > >>> if you're working on that).
> > > > >>> I'm guessing that you can do similar things to
> CastRecordDescriptor
> > > > >>> if you want to handle general cases in that region.
> > > > >>>
> > > > >>> 3. Set to null or skip
> > > > >>>>> For optional(nullable) attributes, if I want to insert a record
> > > with
> > > > >>>>>
> > > > >>>> null
> > > > >>>
> > > > >>>> value on that attribute. Should I assign null value or should I
> > just
> > > > >>>>>
> > > > >>>> skip
> > > > >>>
> > > > >>>> it? (Probably this is related to the missing attribute that
> Yingyi
> > > > >>>>> mentioned today?)
> > > > >>>>>
> > > > >>>> Assign null value.
> > > > >>> Missing means the field doesn't exist in a record at all.
> > > > >>>
> > > > >>> Best,
> > > > >>> Yingyi
> > > > >>>
> > > > >>>
> > > > >>> On Fri, Apr 29, 2016 at 2:06 PM, Xikui Wang <xik...@uci.edu>
> > wrote:
> > > > >>>
> > > > >>> Hi devs,
> > > > >>>>
> > > > >>>> I came across several questions while I was constructing records
> > in
> > > > >>>> AsterixDB.  Hope someone can help me clear the confusion. :)
> > > > >>>>
> > > > >>>> 1. Write directly to data output or use SerializerDeserializer
> > > > >>>> I am working with AbstractDataParser now. I see people using
> > > different
> > > > >>>> ways
> > > > >>>> to append attributes to data output. Either use:
> > > > >>>> output.Write(typetag.serialize());
> > > > >>>> output.WriteInt(0);
> > > > >>>> to write into data output directly, or
> > > > >>>> use AInt8SerializerDeserializer.serialize(int8Serde) to
> serialize
> > a
> > > > >>>> AINT8
> > > > >>>> instance to output. *SerializerDeserializer uses writeByte to
> > write
> > > > >>>> output.
> > > > >>>>
> > > > >>>> My feeling is SerializerDeserializer offers another level of
> > > > abstraction
> > > > >>>> but with output I can write value directly without construct
> AType
> > > > >>>> object.
> > > > >>>> I am wondering if there are any preferences over these two?
> > > > >>>>
> > > > >>>> 2. RecordType validation after parser but before add to frame?
> > > > >>>> My observation is after parser finish writing the output and
> pass
> > it
> > > > to
> > > > >>>> next level, there is no such validation that checks whether
> > > > non-optional
> > > > >>>> field is null or not. In other words, parser has to guarantee
> that
> > > the
> > > > >>>> processed records has to match the dataset
> definition(non-optional
> > > > >>>> attribute cannot have null value). I tried to assign null value
> to
> > > > >>>> non-null
> > > > >>>> attributes. It will be inserted successfully but read records
> will
> > > > have
> > > > >>>> problem.
> > > > >>>>
> > > > >>>> 3. Set to null or skip
> > > > >>>> For optional(nullable) attributes, if I want to insert a record
> > with
> > > > >>>> null
> > > > >>>> value on that attribute. Should I assign null value or should I
> > just
> > > > >>>> skip
> > > > >>>> it? (Probably this is related to the missing attribute that
> Yingyi
> > > > >>>> mentioned today?)
> > > > >>>>
> > > > >>>> Thanks for your help.
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Xikui
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >
> > > >
> > >
> >
>

Reply via email to