Hi, Aaron --

I can't speak to issues relevant to Spark, but it looks like json4s is
currently using the Jackson Scala module 2.1.3 and Scala 2.9.2.  There have
been quite a few significant changes to the Scala module and underpinnings
between the 2.1.x and 2.3.x series, but I can't speak to how that interacts
with json4s.  Many of those changes are convenience for direct usage of the
Jackson Scala module in binding case classes transparently, but you
wouldn't need or benefit from those through the json4s API.  (FWIW, we use
Jackson Scala 2.3.2 in our Spark jobs to bind lines of JSON from text files
to case classes.)

I'll reach out to json4s and see if I can get them to update to the 2.3.x
Jackson series and Scala 2.10, but I think it makes sense to for Spark to
just use the released version and then update when a json4s release is
available.

Best.
-- Paul

—
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Wed, Feb 12, 2014 at 10:38 AM, Aaron Davidson <ilike...@gmail.com> wrote:

> Will, thanks for the clarifications. I think Spark's main use-case is
> "warm, small inputs" right now, but the change seems reasonable to me
> nevertheless.
>
> Paul, do you know if there are any issues relevant to Spark that we need
> from 2.3.2? We would also have to wait for json4s to release a new version
> that depends on 2.3.2, or else pull it in ourselves.
>
>
> On Wed, Feb 12, 2014 at 9:47 AM, Paul Brown <p...@mult.ifario.us> wrote:
>
> > And, with my FasterXML hat on, if you ask, you'll find the Jackson folks
> > will turn around issues quickly.  FWIW, there is a full-suite Jackson
> 2.3.2
> > release rolling right up if you wait a couple of days to pull that in.
> >
> > -- Paul
> >
> > --
> > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >
> >
> > On Wed, Feb 12, 2014 at 8:12 AM, Will Benton <wi...@redhat.com> wrote:
> >
> > > ----- Original Message -----
> > >
> > > > I am not sure I fully understand this reasoning. I imagine that
> > lift-json
> > > > is only one of hundreds of packages that would have to be built if
> you
> > > > wanted to build all of Spark's transitive dependencies from source.
> > >
> > > This is absolutely true.  However, many of Spark's dependencies are
> > > already available in operating system distributions.  In fact, in the
> > case
> > > I am most familiar with (packaging Spark for Fedora), Lift is the
> biggest
> > > one left that isn't already available or under review.
> > >
> > > > Additionally, to make sure I understand the impact -- this is only
> > > intended
> > > > to simplify the process of packaging Spark on a new OS distribution
> > that
> > > > disallows pulling in binaries?
> > >
> > > Yes, this was my main motivation.  Since the process of building Lift
> and
> > > its transitive dependencies is disproportionately complex compared to
> how
> > > much Spark uses lift-json, I thought it would be nice to replace it
> with
> > > something that could be built as just a JSON library.  I would argue
> that
> > > -- all else being equal -- it generally makes sense to make software
> > > development choices that facilitate packaging for distributions like
> > Fedora
> > > and Debian.
> > >
> > > There are other actual and potential advantages, though; here are a
> few:
> > >
> > > 1.  Based on some simple timing runs I did, json4s-jackson is faster
> all
> > > around when running warm (i.e. on subsequent timing runs in the same VM
> > or
> > > timing runs with enough iterations to last for more than a few
> seconds),
> > > slightly slower when running cold on very small parsing tasks, and
> > > significantly (~10x) faster on large parsing tasks whether cold or
> warm.
> > >  The knee in the cold lift-json performance curve is somewhere between
> > 2kb
> > > and 50kb of JSON source text.  json4s-jackson is nominally faster cold
> > with
> > > a 12kb file, 40% faster with a 50kb file, 2.6x faster with a 500kb file
> > and
> > > 10x faster with files ranging from 4-20mb.  Given how Spark uses JSON
> at
> > > the moment, the improved large-file parsing performance seems unlikely
> to
> > > be a huge practical advantage for json4s-jackson, but it's worth
> noting.
> > > 2.  The release schedule of json4s isn't coupled to the release
> schedule
> > > of a larger project.
> > > 3.  json4s is intended to provide a uniform interface to Scala JSON
> > > libraries, and it provides multiple backends, which offers potential
> > > flexibility in the future.  (To be fair, this interface is heavily
> based
> > on
> > > the one provided by Lift, so it would be only slightly more work to go
> > from
> > > lift-json to json4s, as my patch does, as it would be to switch between
> > > json4s backends.)
> > >
> > > Again, this change is primarily motivated by a desire to make life
> easier
> > > for downstream packagers, but there is no obvious downside (beyond the
> > > downsides inherent in changing library dependencies) and several minor
> > > advantages.
> > >
> > >
> > > best,
> > > wb
> > >
> >
>

Reply via email to