Re: MADLib and Greenplum Large Objects

Ivan Novick Wed, 23 Dec 2015 13:40:04 -0800

Hi Roman,

There are requests for bigger intermediate data on madlib.

Here is an extract from a request:

"""
Currently 1GB is the max field size for any data in a column in a row. We
want to increase this in GPDB 100GB. This will also be used by data science
to address issue below and also to store in a column a bigger thing like an
XML or JSON doc that is larger than 1GB.

As a developer, I want to maintain a larger internal aggregate state in
memory > 1 GB, so that I can operate on larger data sets.

Notes
1) Many MADlib algorithms need to maintain large internal aggregates. One
example is the LDA algorithm that is limited to the number of topics X
vocabulary sizes < ~250M due to the 1 GB limit. For text analytics, this is
quite restrictive.
References
[1] http://www.postgresql.org/docs/9.4/static/sql-createaggregate.html
"""

On Wed, Dec 23, 2015 at 1:17 PM, Roman Shaposhnik <[email protected]>
wrote:

> Atri,
>
> I'm curious what usage to you see for LOs when
> it comes to MADlib?
>
> Thanks,
> Roman.
>
> On Tue, Dec 22, 2015 at 7:53 AM, Atri Sharma <[email protected]> wrote:
> > Hi All,
> >
> > We are currently working on making Greenplum Large Objects better and
> > awesome.
> >
> > We were thinking of seeing if MADLib can benefit from Large Objects and
> use
> > them in a manner which is helpful. MADLib can see if Large Objects can be
> > used as intermediate objects for intermediate states that are large.
> >
> > Large Objects API can be seen
> > http://www.postgresql.org/docs/9.2/static/largeobjects.html
> >
> > Large Objects will eventually scale out in Greenplum. They will be
> > distributed across cluster and queries will be performant.
> >
> > Regards,
> >
> > Atri
>

Re: MADLib and Greenplum Large Objects

Reply via email to