Hi all, With reference to the ECI letter thread I feel it is time to awaken this thread again.
On Tuesday, May 27, 2014 8:46:27 AM UTC+5:30, Nisha Thompson wrote: > > THanks Dilip, > > Those 5 points are right on. I would also add a point about ownership and > licensing. > > I think formats is a good conversation we can have. > > comments on the others below > > > On 26.05.2014 06:50, Dilip Damle wrote: >>> > Hello, >>> > >>> > I think we need to discuss the following >>> > >>> > 1. When is the data eligible to go to Repository >>> > >>> > There could be several factors here. Mainly cleanliness and >>> completeness. >>> >> 1) I would like to figure out what is a good threshold of cleanliness and > completeness? I think robust meta data is important for that. > > >> > >>> > 2. Place other than Repository for temporary data. >>> > I think it should surely not be "only an attachment to a post here" >>> > Then it becomes difficult to find later >>> > Administrators should decide on suitable place >>> >> A temporary file isn't a bad idea. Maybe a google drive or drop box run > by datameet could do that. We can put up the tasks for each dataset on the > web and ask people to clean up then give access? > >> > >>> > 3. The particular formats itself >>> > >>> > This could vary based on type of data >>> > >>> > My observations is that for many types of data Multiple Linked >>> Tables >>> > serve better than a single CSV file which is more common. >>> > In this case is .mdb acceptable or is there any other open format for >>> > linked tables. >>> > >>> > this could be a long topic... >>> > >>> > 4. Compressing multiple files in one file >>> > >>> > Unless there is a reason multiple files that go together should be >>> > bundled in to one file. >>> > This should also be true for repository. >>> > >>> > 5. About the content itself >>> > >>> > Since multiple people will contribute/edit to data we will have to have >>> > some rules. >>> > example : when there is a Unique for the data it should always be used >>> > otherwise combining comparing the data becomes difficult. >>> > ( presently I am trying to collate the election results data and find >>> > there are differences in the different sources especially in the Names >>> > of places. Will be putting up the collated data in .mdb format in a few >>> > days) >>> >> I'm going to think about for a bit but i think standardization is a > really important task that requires a larger discussion. > >> > >>> > On Friday, May 23, 2014 10:06:35 AM UTC+5:30, Nisha Thompson wrote: >>> > >>> > In the discussion guidelines thread Dilip suggested we have some >>> > data sharing guidelines and a place to store some of the more >>> casual >>> > datasets, people are cleaning up. >>> > >>> > I think its a good idea. >>> > >>> > Can we use this thread as a place to discuss formats, procedure, >>> and >>> > a good place to put it. >>> > >>> > We have a github already set up, we can start with that, maybe >>> > create a project called - Data that needs to be cleaned up. >>> > >>> > Any other suggestions? >>> > >>> > Nisha >>> > >>> > -- >>> > Nisha Thompson >>> > DataMeet.org >>> > ni...@datameet.org <javascript:> >>> > skype: nishaqt >>> > mobile: 962-061-2245 >>> > >>> > -- >>> > Datameet is a community of Data Science enthusiasts in India. Know more >>> > about us by visiting http://datameet.org >>> > --- >>> > You received this message because you are subscribed to the Google >>> > Groups "datameet" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> > an email to datameet+u...@googlegroups.com <javascript:> >>> > <mailto:datameet+u...@googlegroups.com <javascript:>>. >>> > For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford >>> Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany >>> Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind >>> >>> Please do consider http://www.gnupg.org for encryption (key id A5ED49AE) >>> >>> -- >>> Datameet is a community of Data Science enthusiasts in India. Know more >>> about us by visiting http://datameet.org >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "datameet" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to datameet+u...@googlegroups.com <javascript:>. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to datameet+u...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to datameet+u...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Nisha Thompson > DataMeet.org > ni...@datameet.org <javascript:> > skype: nishaqt > mobile: 962-061-2245 > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.