@Yingyi for null/missing guidance?

On 4/30/16 1:52 PM, abdullah alamoudi wrote:
Adding a few points here:

My feeling is SerializerDeserializer offers another level of abstraction
but with output I can write value directly without construct AType object.
I am wondering if there are any preferences over these two?

- Using The SerializerDeserializer option, you will only create a single
object regardless of the number of parsed records, so I wouldn't worry
about it. Code maintainability takes precedence here IMO.
- In addition to records and lists, UTF8StringSerializerDeserializer can be
stateful for the same reason (avoid creating lost of un-needed objects). In
fact, our parsers use the stateful UTF8StringSerializerDeserializer since I
noticed that using the stateless one creates lots of byte[] and triggers GC
over and over.
- Right now, we parse missing values as null. Should that change?
- There is definitely a check for nulls on non-nullable values at least in
the ADM parser. There might be a bug however that makes it accept explicit
null values and that should be fixed.

I am for NOT using the cast record solution for the overhead it will add.
but that is just me :)
~Abdullah.


On Sat, Apr 30, 2016 at 6:48 AM, Xikui Wang <xik...@uci.edu> wrote:

Thank you Yingyi. I will try to figure out a solution from that direction.

Best,
Xikui

On Fri, Apr 29, 2016 at 3:48 PM, Yingyi Bu <buyin...@gmail.com> wrote:

Yeah, I think so:-)

Best,
Yingyi

On Fri, Apr 29, 2016 at 3:46 PM, Mike Carey <dtab...@gmail.com> wrote:

This indeed might be cleaner?


On 4/29/16 3:28 PM, Yingyi Bu wrote:

I'm guessing that you can do similar things to CastRecordDescriptor
if you want to handle general cases in that region.

Or, you can inject a cast-record function in the loading pipeline
so that you can defer the runtime-type-check/cast to that function
instead
of doing that in the parser.


On Fri, Apr 29, 2016 at 3:25 PM, Yingyi Bu <buyin...@gmail.com>
wrote:
My answer is inlined.
My feeling is SerializerDeserializer offers another level of
abstraction
but with output I can write value directly without construct AType

object.
I am wondering if there are any preferences over these two?
I agree with you. However, a SerializerDeserializer has to be
stateless,
hence it cannot be used at runtime for complex type objects such as
records and lists,
because it will create a lot Java objects.

in other words, parser has to guarantee that the
processed records has to match the dataset definition(non-optional
attribute cannot have null value). I tried to assign null value to

non-null
attributes. It will be inserted successfully but read records will
have
problem.

That sounds right to me.  Please file a JIRA issue and assign to
you (
if you're working on that).
I'm guessing that you can do similar things to CastRecordDescriptor
if you want to handle general cases in that region.

3. Set to null or skip
For optional(nullable) attributes, if I want to insert a record
with
null
value on that attribute. Should I assign null value or should I just
skip
it? (Probably this is related to the missing attribute that Yingyi
mentioned today?)

Assign null value.
Missing means the field doesn't exist in a record at all.

Best,
Yingyi


On Fri, Apr 29, 2016 at 2:06 PM, Xikui Wang <xik...@uci.edu> wrote:

Hi devs,
I came across several questions while I was constructing records in
AsterixDB.  Hope someone can help me clear the confusion. :)

1. Write directly to data output or use SerializerDeserializer
I am working with AbstractDataParser now. I see people using
different
ways
to append attributes to data output. Either use:
output.Write(typetag.serialize());
output.WriteInt(0);
to write into data output directly, or
use AInt8SerializerDeserializer.serialize(int8Serde) to serialize a
AINT8
instance to output. *SerializerDeserializer uses writeByte to write
output.

My feeling is SerializerDeserializer offers another level of
abstraction
but with output I can write value directly without construct AType
object.
I am wondering if there are any preferences over these two?

2. RecordType validation after parser but before add to frame?
My observation is after parser finish writing the output and pass it
to
next level, there is no such validation that checks whether
non-optional
field is null or not. In other words, parser has to guarantee that
the
processed records has to match the dataset definition(non-optional
attribute cannot have null value). I tried to assign null value to
non-null
attributes. It will be inserted successfully but read records will
have
problem.

3. Set to null or skip
For optional(nullable) attributes, if I want to insert a record with
null
value on that attribute. Should I assign null value or should I just
skip
it? (Probably this is related to the missing attribute that Yingyi
mentioned today?)

Thanks for your help.

Best,
Xikui



Reply via email to