On Jul 23, 2014, at 5:29 PM, Kyle Banerjee wrote:

> We've been facing increasing requests to help researchers publish datasets.
> There are many dimensions to this problem, but one of them is applying
> appropriate metadata and mounting them so they can be explored with a
> regular web browser or downloaded by expert users using specialized tools.
> 
> Datasets often are large. One that we used for a pilot project contained
> well over 10,000 objects with a total size of about 1 TB. We've been asked
> to help with much larger and more complex datasets.
> 
> The pilot was successful but our current process is neither scalable nor
> sustainable. We have some ideas on how to proceed, but we're mostly making
> things up. Are there methods/tools/etc you've found helpful? Also, where
> should we look for ideas? Thanks,


The tools I use are too customized for our field to be of much use to anyone 
else, so can't help on that part of the question.


I'd really recommend trying to reach out to someone working in data informatics 
in the field that the data is from, as they would have recommendations on 
specific metadata that should be captured.


For the general 'data publication' community, it's coalescing, but still a bit 
all over the place.  Here are some of the ones that I know about:

        JISC has a 'Data Publication' mailing list:

                https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=DATA-PUBLICATION
        
        ASIS&T runs a 'Research Data Access & Preservation' conference and 
mailing list:

                http://www.asis.org/rdap/
                http://mail.asis.org/mailman/listinfo/rdap

        ... and they put most of the presentations up on slideshare:

                http://www.slideshare.net/asist_org/

        The Research Data Alliance has two working groups on the topic, 
Publishing Services and Publishing Data Workflows:

                https://rd-alliance.org/group/rdawds-publishing-services-wg.html
                
https://rd-alliance.org/group/rdawds-publishing-data-workflows-wg.html


I'm also one of the moderators of the Open Data site on Stack Exchange, which 
has some questions that might be relevant:

        Let's suppose I have potentially interesting data. How to distribute?
                http://opendata.stackexchange.com/q/768/263
        
        Benefits of using CC0 over CC-BY for data
                http://opendata.stackexchange.com/q/26/263

        ... or just ask a new question.


I'd also recommend that when you catalog your data, that you also consider 
adding DataCite metadata, so that we can try to make it easier for others to 
cite your data.   (specific implementation recommendations for data citation 
are still evolving, but general principles have been released; if you have 
questions, feel free to ask me, as I think we need to add some clarification to 
what we mean on some of the items).

        http://www.datacite.org/
        https://www.force11.org/datacitation


As I see it, you're dealing with data that's in the problem range -- if it were 
larger, the department collecting the data would have a system in place 
already; if it were smaller, it's easier to manage as a single item for deposit.


-Joe

Reply via email to