Hi Wes,

Nice to connect to you too. We are happy to have your input on Albis and
Arrow. Specifically:

- We understand that Arrow is not a file format, but we chose to evaluate
it in a mix with storage formats as Arrow is designed for in-memory
columnar storage. The "in-memory" aspect of it is closer to flash/NVMe than
disks in terms of performance. And personally I was curious to try out
Arrow :) We coded a simple benchmark (how fast one can materialize values)
because anything more complicated like relational queries would bring
complexity from the underlying SQL engine.

- Yes, I will make it clear that the performance of Arrow that is evaluated
in the blog is for the less beaten on-heap Java path.

Now coming to the interesting bit. Arrow storage performance tuning (HDFS
or Crail) that I can help to investigate. This is a good starting point. I
will update you all on the Crail and Arrow mailing lists. Beyond
performance, the multi-file storage model is where I am most interested. It
will help us to explore how different file types (column groups, metadata)
can be mapped to different storage (NVMe, DRAM, 3DXP) types that Crail
supports. I think this is an interesting avenue to explore.

Wes and Julian - thanks for the discussion.

Cheers,
--
Animesh

On Wed, Sep 5, 2018 at 8:57 PM Julian Hyde <[email protected]> wrote:

> Animesh,
>
> Thanks for your thoughtful response.
>
> I think we’re now on the same page about the opportunities for
> collaboration. And I saw that Wes posted to this thread too. I hope you
> find ways to make Arrow and Crail work well together.
>
> Julian
>
>
> > On Sep 5, 2018, at 3:49 AM, Animesh Trivedi <[email protected]>
> wrote:
> >
> > Hi Julian,
> >
> > Thanks for posting your thoughts.
> >
> > [As a Crail committer]: We agree that the notion of "we" creates
> confusion.
> > The Crail blog follows the trend in community projects, where a blogpost
> > falls in one of the two categories. The first type where a developer
> talks
> > about recent improvements, features, performance evaluation, etc. The
> > second type is where "a user" presents how they used the system for their
> > use-case. The Albis blog post falls into the second category. We can (and
> > should for future references) definitely categorize and mark it clear
> that
> > way. And we would encourage the community, whoever tries Crail please
> reach
> > out to us to present your story on the Crail blog. Crail is committed to
> > provide the best possible performance to all its users, be it Albis,
> Arrow,
> > ORC, or Parquet.
> >
> > [As a developer of Albis and user of Crail]: I understand your sentiment
> > regarding the format wars, and it is not the aim of Albis to establish
> yet
> > another file format. Albis started as a prototype to quickly "explore"
> > various design choices for storing relational data for a variety of
> > scenarios with high-performance storage/networking devices - the kind of
> > devices Crail targets. This is something that I cannot easily do with
> > Arrow, ORC, or Parquet with HDFS (or something similar) within a
> reasonable
> > effort and time-frame as they all have already chosen certain design
> points
> > and trade-offs. Crail and Albis are not tied (or are preferred over other
> > choices) to each other, though since it is coming from a same set of
> > developers, I can see why the confusion arises. Having said this, I will
> be
> > happy to contribute back to the Arrow community about the findings from
> > Albis, and would appreciate any help with that. I had a brief discussion
> > with Julien Le Dem at last DataWorks summit in San Jose about Albis as
> > well. I have not done a through investigation of Arrow over Crail, but
> > perhaps something that can be picked-up now as a starting point.
> >
> > I hope this clarifies the confusion. We will fix the blog post.
> >
> > Thanks,
> > --
> > Animesh
> >
> > On Tue, Sep 4, 2018 at 9:59 PM Julian Hyde <[email protected] <mailto:
> [email protected]>> wrote:
> >
> >> I just read the blog post [1] about Crail and file formats. (I have to
> >> declare my interests up front: I have been a huge supporter of Apache
> >> Arrow, and I am a PMC member. I’m speaking here as an Arrow contributor
> and
> >> enthusiast, not as a mentor of Crail.)
> >>
> >> I am a bit troubled about the endorsement of Albis in a Crail blog post.
> >> For example, "we have developed a new file format called Albis”. Since
> the
> >> blog post is not signed, I take it that “We” means the authors of the
> paper
> >> [2] mentioned in the blog post. But I hope that “we” does not mean “we
> as
> >> Crail committers and PMC members".
> >>
> >> I know that there are different forces at play if you work for a
> >> corporation, or are a researcher, or are an idealistic open source. As a
> >> researcher, you need to invent new stuff and prove that it is better
> than
> >> everything that has been done before.
> >>
> >> But I’ve been through the file format wars — ORC vs Parquet — driven in
> >> large part by two competing vendors. It was sickening, and a huge waste
> of
> >> effort. Please, please don’t let this happen again. If you want to make
> >> Crail successful, you should make it absolutely clear to the Arrow, ORC
> and
> >> Parquet communities that you will help to make Crail work as well as it
> >> possibly can
> >>
> >> Also, on paper Albis looks very similar to Arrow, and the performance
> gap
> >> is fairly narrow. If you have found insights that would improve Arrow, I
> >> encourage you to share them and make Arrow better. It may be good
> research
> >> practice to accentuate the differences between the two, but it’s good
> open
> >> source practice to find consensus between technologies, and merge
> >> communities. There is a lot of work to be done, and too few people to
> do it.
> >>
> >> Lastly, I know I seem to be giving mixed messages here. I do believe
> that
> >> content about Crail will help drive engagement and build community
> >> (controversial content even more so). I am delighted that the Crail
> team is
> >> writing blog posts and posting them to Twitter. But be careful not to
> >> alienate communities that could help Crail gain widespread adoption.
> >>
> >> Julian
> >>
> >> [1] http://crail.incubator.apache.org/blog/2018/08/sql-p1.html <
> >> http://crail.incubator.apache.org/blog/2018/08/sql-p1.html <
> http://crail.incubator.apache.org/blog/2018/08/sql-p1.html>>
> >>
> >> [2] https://www.usenix.org/conference/atc18/presentation/trivedi <
> https://www.usenix.org/conference/atc18/presentation/trivedi> <
> >> https://www.usenix.org/conference/atc18/presentation/trivedi <
> https://www.usenix.org/conference/atc18/presentation/trivedi>>
>
>

Reply via email to