I'll subside after one minor note on the "sky is the archive." I once had a course from W. W. Morgan, the U. Chicago prof who developed the atlas of stellar types (A, O, B, etc.). He had the spectrum of a "standard type R". As I recall, two weeks after he published his atlas with the spectra, the star defining the type became a variable.
Also, I note that on this very Google Mail page, I can get a "Free Guide to Big Data", as well as the "IBM Big Data Free eBook". I suppose I don't need to go to a conference to become informed. Bruce B. On Wed, Mar 20, 2013 at 10:21 AM, Mattmann, Chris A (388J) < [email protected]> wrote: > Hey Bruce, > > A couple points: > > On 3/20/13 5:46 AM, "Bruce Barkstrom" <[email protected]> wrote: > > >That may be a bit better. > > > >However, it still isn't clear to me how the physics of the instruments > >and of the data processing gets into what users understand they > >can do with the data. > > Yeah agreed. At the same time, this is kind of difficult to throw into > a 45 min with 15 mins "techie talk" that I haven't even prepared yet, > and even harder to throw in to a 100 word (what you see on the website) > and 200 word (longer, what I sent you) abstract that they requested. > > > > >As I understand Big Data and analytics, it usually appears to using > >a lot of statistics to find unexpected correlations in the data, but > >the techniques aren't looking for causation. If you're dealing with > >scientific data, you're usually trying to get to physical causation. > >That means, I think, that users need to understand how the > >physics and math constrain what they can do. > > ++50 agreed. > > > > >Let me see if I can identify a more concrete example of a > >concern. Usually, when we want to deal with physically > >connected phenomena, we want disparate data to be > >observing the same chunk of space at the same time. > >If the Big Data user picks up one piece of data from region > >X_1 and t_1 and then develops a correlation with observations > >with data from X_2 and t_2, where X_1 /= X_2 and t_1 /= t_2, > >it isn't clear why that correlation has anything to do with > >physical causation. Of, to put it another way, Big Data > >may just give more examples of the "cherry picking" > >climate deniers do when they select data without > >paying attention to the statistical and physical significance > >of their "results". > > Totally agree. This is the big difference between card > carrying statisticians a lot of time and *computer science* > oriented *machine learning* people. > > > > >So, even though the data rates are large by today's > >standards, I'm not sure that, by itself, is impressive. > > Well I have to say it is impressive. Can you show me a disk > that can today write 700 TB/data per second? Or the filesystem > drivers and parallel I/O necessary to software them? Imagine in > astronomy, where they are moving into the time domain, and > away from the "sky is the archive" "so just reobserve next > time" mentality, and thus triage, which is super important, > isn't the main driver and archival is now becoming important, > and necessary in these eventually 700TB/sec producing systems. > > There are all sorts of IO, hardware, computer science, and > other advances that we don't have that are needed, and that > these types of examples like the SKA will drive. > > OTOH, the sheer infrastructure, domestic and international policy, > investment, and excitement and sense of nationality that many of > these new Big Data systems (especially the SKA) are creating in > their respective countries (e.g., in South Africa), is enough > to at least suggest to my evidence based mind that there is > something impressive here. > > >Maybe the relevant example would be all those statistics > >on dams built or tons of steel produced by the Soviet > >Union. The hype would be more interesting if it could > >talk about what new phenomena or understanding > >these techniques will produce - not just the data rate > >or the total amount of data being produced. > > Agreed, lots of data has been generated for a while. However, > the volume (total and discrete); velocity, and variety (in > data types, metadata, etc.) are certainly such that they are > worthy of current study, at least in the area of data management. > > > > >Maybe it's just a glorified popularity contest; if so, > >it would seem to be at about the level of interest > >of the new season of "Dancing with the Stars". > > Perhaps, but I know you guys are interested in that show :) > Who's not? > > >I suppose the hype is necessary to generate the > >funding (which has its uses), but I'm not sure it > >will do as much as a few million sent to appropriate > >super PACs to move the politics of climate change > >along. > > Think of this as an IT super PAC for next generation data management > techniques and systems to deal with data volumes and varieties that > we don't have hardware or CS tools to manage yet. I'm not talking > about writing to tape and letting it die the morgue. I'm talking about > even simple things like making it available after you write it to spinning > disk. > > Cheers, > Chris > > > > >Bruce B. > > > >On Wed, Mar 20, 2013 at 1:16 AM, Mattmann, Chris A (388J) < > >[email protected]> wrote: > > > >> Hey Bruce, > >> > >> Hah! > >> > >> Unfortunately all you get is the short summary through > >> the website which does make it scientifically hard to > >> judge, however, then again this isn't science, it's a > >> glorified popularity contest. > >> > >> I have a little bit more detailed abstract that I wrote up, > >> pasted below (of course the part that they don't use to solicit votes): > >> > >> ---longer abstract > >> The NASA Jet Propulsion Laboratory, California Institute of > >> Technology contributes to many Big Data projects for Earth science such > >>as > >> the > >> U.S. National Climate Assessment (NCA) and for astronomy such as next > >> generation astronomical instruments like the Square Kilometre Array > >>(SKA) > >> that > >> will generate unprecedented volumes of data (700TB/sec!). > >> > >> Through these projects, we are addressing four key > >> challenges critical for the Hadoop community and broader open source Big > >> Data > >> community to consider: (1) unobtrusively integrating science algorithms > >> into > >> large scale processing systems; (2) selecting and deploying high powered > >> data > >> movement technologies for data staging and remote data acquisition; > >> processing, > >> and delivery to our customers and users; (3) better leveraging of cloud > >> computing (storage and processing) technologies in NASA missions; and > >>(4) > >> technologies for automatically and rapidly extracting text and metadata > >> from > >> the file formats, by some estimates ranging from a few thousand to over > >> fifty > >> thousand in total. > >> > >> This talk will focus on those Big Data challenges, how NASA > >> JPL is addressing them both technologically (Hadoop, OODT, Tika, Nutch, > >> Solr) > >> and from a community standpoint (Apache, interacting with open source, > >> etc.). > >> I¹ll also discuss the future of Big Data at JPL and NASA and how others > >> can get > >> Involved. > >> ----- > >> > >> You can think of that as the longer version of what I submitted. *grin* > >> > >> Cheers, > >> Chris > >> > >> > >> > >> On 3/19/13 7:20 PM, "Bruce Barkstrom" <[email protected]> wrote: > >> > >> >OK, so you've got a three-word summary of some > >> >hyperbole with Dumbo, the Flying Elephant. > >> >How are you going to deal with the real > >> >scientific constraints on the physics of combining real > >> >measurement technologies and "mashing stuff together"? > >> > > >> >You need to remember that imaging instruments integrate > >> >radiances with spectral responses and Point Spread Function > >> >weighted averages over the FOV of whatever the instrument > >> >was looking at - and that's just the instantaneous (L1 measurement). > >> >If you do orthorectification, you've got variations in the > >>uncertainties > >> >across the image where the parts of the image where you've > >> >increased the resolving power (by putting interpolated points > >> >closer together) and have also increased the noise from the > >> >orthorectification process that acts as a noise multiplier. > >> > > >> >Next, you've got stuff like cloud identification (and rejection or > >> >acceptance) - which depends on spectral response, solar illumination > >> >(during the day) and temperature and cloud property stuff during > >> >the night - and finally, you've got temporal interpolation (not just > >> >creating an average through emission driven by solar illumination > >> >during the day and IR cooling at night. Where (the hel)l is > >> >the physics that deals with this stuff? If you do get some > >> >statistical stuff, why should anyone believe it contributes to > >> >our understanding of climate change? > >> > > >> >I won't vote, but you can think of this as my input to your > >> >scientific conscience. > >> > > >> >Bruce B. > >> > > >> >On Tue, Mar 19, 2013 at 7:51 PM, Mattmann, Chris A (388J) < > >> >[email protected]> wrote: > >> > > >> >> Hey Guys, > >> >> > >> >> I proposed a talk for NASA and Big Data at the Hadoop Summit: > >> >> > >> >> > >> >> > >> > >> > http://hadoopsummit2013.uservoice.com/forums/196822-future-of-apache-hado > >> >>op > >> >> > >>/suggestions/3733470-nasa-science-and-technology-for-big-data-junkies- > >> >> > >> >> > >> >> If you still have votes, and would like to support my talk, I'd > >> >>certainly > >> >> appreciate it! > >> >> > >> >> Thank you for considering. > >> >> > >> >> Cheers, > >> >> Chris Mattmann > >> >> Vote Herder > >> >> > >> >> > >> > >> > >
