Hello Ashish,

Months ago, I performed a very basic load of data on MM - MongoDB connector
and this was the result:

*query with group by*:


*dataContext.query().from(getCollectionName()).select("id").and("name").where("foo").isEquals("bar").groupBy("name")
.execute();*

a) 5.000.000 records:




17:25 - 17:26:x Data loading..
17:26:x - 17:27:x Query performing..
​
b) 10.000.000 records:




17:34 - 17:38:x Data loading..
17:38:x - 17:40:x Query performing..

~~~~~~~~~~~~~~~~~~~~~~~~~

I modified the *MM mongo connector tests* to load the datasets regarding
the previous figures.

I hope the provided information will be useful for you.

See you,
​

2015-06-09 16:53 GMT+02:00 Ashish Mukherjee <[email protected]>:

> Thank you, Kasper. That's a good insight. Couple of further questions -
>
> 1. Any idea of the size of the largest data-sets?
> 2. Are the deployments all on Cloud or on-premise too?
>
> Regards,
> Ashish
>
> On Mon, Jun 8, 2015 at 12:48 PM, Kasper Sørensen <
> [email protected]> wrote:
>
> > Hi Ashish,
> >
> > A bit of information from our side - I represent Human Inference, a data
> > quality company owned by Neopost. MetaModel was originally founded in our
> > R&D labs :-)
> >
> > We use MetaModel in a bunch of applications, primarily:
> > DataCleaner - www.datacleaner.org - an open source data quality
> solution -
> > with over 15,000 registered users. To my knowledge they use all the
> > connectors and in data load sizes ranging from tiny to huge.
> > Data Improver - www.dataimprover.com - a cloud-based contact/mailing
> data
> > cleansing street. It's sold primarily in the UK but expanding to US,
> > Germany, NL and probably more in the future. The sources here are CSV and
> > Excel files.
> > HIquality MDM - our Master Data Management hub which is consuming data
> from
> > many sources, so the wide array of connectors is a huge value-add there
> > too.
> >
> > One commonality about all three applications is that it is primarily
> using
> > MM for batch processing. Typically onboarding a big load of records,
> doing
> > some complex processing on them and inserting them then into a cleansed
> new
> > datastore. Some of the tools obviously also then does adhoc querying
> > afterwards, but that's then more in an environment that is more
> homogenic.
> >
> > Best regards,
> > Kasper
> >
> >
> > 2015-06-08 8:18 GMT+02:00 Ashish Mukherjee <[email protected]>:
> >
> > > Hello,
> > >
> > > I am aware couple of companies are using MM for their production
> > > applications, ,as stated on this list.
> > >
> > > For a better understanding of the trends in terms of its use and scale
> of
> > > operation, I was wondering what are the connectors which people are
> using
> > > most and what are the typical data sizes being queried through MM etc.
> > > Which data stores do people generally query together or use in a
> combined
> > > way?
> > >
> > > Would any users of MM be willing to share some info related to this?
> > >
> > > Thanks,
> > > Ashish
> > >
> >
>



-- 

Francisco Javier Cano
Senior Developer


<http://www.stratio.com/>
Vía de las Dos Castillas, 33. Ática 4. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Reply via email to