what about map-reduce? chop the job to much smaller sub-jobs and then have someone(another server or PC) else to process. i assume your app probably loads data from a DB, and wont load the 1TB data and then process it entirely.
doing something kinda like pagination? fetch a chunk and have someone else to do the calculation? then reduce, combine the results and map again if necessary? as for the 1mb limit, convert the object to byte(serialization) and split it to pieces(can come up w/ a data structure)? still much quicker than writing to disk. accessing memory in general is 250,000 times quicker than doing the same against the disk. On Sep 22, 10:33 am, "MikeG." <[email protected]> wrote: > Thanks, > > Sounds good. Unfortunately my situation is unique. > I'm trying to expedite a "meta production" application. This is an > application that > create content for a production application. The creation is slow > because there is > a lot of big records that need to be updated and fetched. > > Looks like I can't avoid a cluster cache because I need 1TB and > growing DB. > This means at least 16 x 64GB machines. There is no justification to > have such cluster > for non-production engine. > > I'm back to look at the architecture to get some more speed. > > On Sep 22, 6:12 am, ligerdave <[email protected]> wrote: > > > MongoDB is actually "cached" db, meaning that, most of its records are > > in memory. > > > I think there is also a memcached and DB hybrid which comes w/ a > > persistent option. i think it's called memcachedDB, which runs a in- > > memory db(like mongodb). this shares most of common api w/ memcached > > so you dont have to change code very much > > > On Sep 21, 2:11 am, Joseph Engo <[email protected]> wrote: > > > > Yes, MongoDB is a database. > > > > You mentioned blobs so I figured you were dealing with binary objects. > > > > The problem with your use case is objects larger then 1MB. Thats really > > > not > > > what Memcached is intended to be used for. You could split the objects > > > into > > > smaller chunks. But keep in mind, no object is guaranteed to be in the > > > cache. So if 1 chunk of your large object is missing you need to pull the > > > entire object back from your datastore and split it back into chunks > > > again. > > > Sure, its very rare that it would happen but it does. > > > > What happens if you need to reboot your server for patches or you have a > > > crash. You will need to warm up the cache and that could take a while > > > depending on how you assemble the objects. > > > > It really sounds to be that you should do optimizations on your DB layer > > > first before just throwing a cache at it. :) > > > > On Mon, Sep 20, 2010 at 10:54 PM, MikeG. <[email protected]> wrote: > > > > First, thanks for taking the time. > > > > > Now, MongoDB is a cache engine? I think it's a DB? > > > > > I need a cache engine that can handle (some) objects bigger then 1MB. > > > > I don't have binary data. All I have is ascii elements. My final > > > > product is an aggregation > > > > of meta data of many binary code pieces. > > > > > Just for context - my company has many products while each may have > > > > several hardware > > > > versions. Each version has it's own binary image. > > > > If you have one of our products and you need to upgrade the software > > > > you get the binary from on > > > > place (I have nothing to do with it) and a build meta data from > > > > another place. It's a scheme to have > > > > only the legitimate users able to upgrade. > > > > > My application builds or assembles all the various products builds > > > > meta information and download it > > > > to the distribution cluster. > > > > > Currently my application takes forever because everything needs to be > > > > read from the DB. > > > > Caching it will do wonders. I'm sure about that. Profiling my > > > > application I found that the total time > > > > spent on DB reading is hugh. > > > > > Do you see any problem using memcached? > > > > > Thanks, > > > > -Michael > > > > > On Sep 20, 10:18 pm, Joseph Engo <[email protected]> wrote: > > > > > If you need larger than 1MB objects and you are only serving these > > > > objects > > > > > to 5 users at a time. It really sounds like memcache is the wrong > > > > > tool > > > > for > > > > > your project. You might want to look into something like MongoDB > > > > > which > > > > has > > > > > a larger object limit of 4MB. There are a number of key value stores > > > > that > > > > > can handle even larger size objects. > > > > > > Could you explain a little more on what type of binary data you > > > > > are manipulating ? > > > > > > On Mon, Sep 20, 2010 at 8:25 PM, MikeG. <[email protected]> wrote: > > > > > > Hi, > > > > > > > I'm starting a project in which I would like to have the entire DB > > > > > > in > > > > > > cache. > > > > > > The reason is that my transactions are reading large amount of data > > > > > > from the DB to generate > > > > > > a deliverable blob. > > > > > > > I have no concurrency issues. I will always have around 5 users max > > > > > > concurrently and most > > > > > > of the time it will be single user. SO I look at the context as > > > > > > virtually single user. > > > > > > > To avoid this massive DB reading (of large count of large chunks) I > > > > > > would like to have it > > > > > > permanently in memory. > > > > > > When any modification of a record happened it is not being written > > > > > > to > > > > > > the DB until the > > > > > > big blob final product is delivered and the local system goes idle. > > > > > > Only then modifications are written to the DB. Upon successful > > > > > > completion of DB update the local system sends a > > > > > > message to the recipient of the product to inform it that the DB is > > > > > > now in sync with the product > > > > > > at hand and it can be consumed. > > > > > > > Now, to hold 1TB in memory I need a cluster and not a small one. I > > > > > > have decided to use > > > > > > memory mapped files such that my RAM is virtual memory. Easy to get > > > > > > large file system of > > > > > > several TBs. > > > > > > > My question - is there any limit memcached has as far as cache size? > > > > > > > Also, memcached (the C implementation) has a 1MB record size limit. > > > > > > 1) > > > > > > What's the reason > > > > > > for that? 2) Can it be changed (with a hacked private version) 3) > > > > > > Does > > > > > > Jmemcached has the > > > > > > same limit? > > > > > > > Thanks, > > > > > > -Michael
