It seems increasing the memory allocation limit is a much harder option, and LO API is the way to do these things in PostgreSQL, so wondering if Madlib can take advantage of it.
Its currently functioning on PostgreSQL so maybe thats the place to try it first before worry about porting to GPDB and HAWQ which should be doable. Cheers, Ivan On Wed, Dec 23, 2015 at 1:45 PM, Caleb Welton <[email protected]> wrote: > This is one place, however I'd have to look at the LO API to understand if > it gets past the memory allocation limitation, and then we'd have to > discuss design of implementation and whether it would be implemented both > in GPDB and HAWQ - which would be a requirement for MADlib. > > Sent from my iPhone > > > On Dec 23, 2015, at 1:39 PM, Ivan Novick <[email protected]> wrote: > > > > Hi Roman, > > > > There are requests for bigger intermediate data on madlib. > > > > Here is an extract from a request: > > > > """ > > Currently 1GB is the max field size for any data in a column in a row. We > > want to increase this in GPDB 100GB. This will also be used by data > science > > to address issue below and also to store in a column a bigger thing like > an > > XML or JSON doc that is larger than 1GB. > > > > As a developer, I want to maintain a larger internal aggregate state in > > memory > 1 GB, so that I can operate on larger data sets. > > > > Notes > > 1) Many MADlib algorithms need to maintain large internal aggregates. One > > example is the LDA algorithm that is limited to the number of topics X > > vocabulary sizes < ~250M due to the 1 GB limit. For text analytics, this > is > > quite restrictive. > > References > > [1] http://www.postgresql.org/docs/9.4/static/sql-createaggregate.html > > """ > > > > On Wed, Dec 23, 2015 at 1:17 PM, Roman Shaposhnik <[email protected]> > > wrote: > > > >> Atri, > >> > >> I'm curious what usage to you see for LOs when > >> it comes to MADlib? > >> > >> Thanks, > >> Roman. > >> > >>> On Tue, Dec 22, 2015 at 7:53 AM, Atri Sharma <[email protected]> > wrote: > >>> Hi All, > >>> > >>> We are currently working on making Greenplum Large Objects better and > >>> awesome. > >>> > >>> We were thinking of seeing if MADLib can benefit from Large Objects and > >> use > >>> them in a manner which is helpful. MADLib can see if Large Objects can > be > >>> used as intermediate objects for intermediate states that are large. > >>> > >>> Large Objects API can be seen > >>> http://www.postgresql.org/docs/9.2/static/largeobjects.html > >>> > >>> Large Objects will eventually scale out in Greenplum. They will be > >>> distributed across cluster and queries will be performant. > >>> > >>> Regards, > >>> > >>> Atri > >> >
