On Jul 23, 2014, at 5:29 PM, Kyle Banerjee wrote:
> We've been facing increasing requests to help researchers publish datasets.
> There are many dimensions to this problem, but one of them is applying
> appropriate metadata and mounting them so they can be explored with a
> regular web browser or downloaded by expert users using specialized tools.
>
> Datasets often are large. One that we used for a pilot project contained
> well over 10,000 objects with a total size of about 1 TB. We've been asked
> to help with much larger and more complex datasets.
>
> The pilot was successful but our current process is neither scalable nor
> sustainable. We have some ideas on how to proceed, but we're mostly making
> things up. Are there methods/tools/etc you've found helpful? Also, where
> should we look for ideas? Thanks,
The tools I use are too customized for our field to be of much use to anyone
else, so can't help on that part of the question.
I'd really recommend trying to reach out to someone working in data informatics
in the field that the data is from, as they would have recommendations on
specific metadata that should be captured.
For the general 'data publication' community, it's coalescing, but still a bit
all over the place. Here are some of the ones that I know about:
JISC has a 'Data Publication' mailing list:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=DATA-PUBLICATION
ASIS&T runs a 'Research Data Access & Preservation' conference and
mailing list:
http://www.asis.org/rdap/
http://mail.asis.org/mailman/listinfo/rdap
... and they put most of the presentations up on slideshare:
http://www.slideshare.net/asist_org/
The Research Data Alliance has two working groups on the topic,
Publishing Services and Publishing Data Workflows:
https://rd-alliance.org/group/rdawds-publishing-services-wg.html
https://rd-alliance.org/group/rdawds-publishing-data-workflows-wg.html
I'm also one of the moderators of the Open Data site on Stack Exchange, which
has some questions that might be relevant:
Let's suppose I have potentially interesting data. How to distribute?
http://opendata.stackexchange.com/q/768/263
Benefits of using CC0 over CC-BY for data
http://opendata.stackexchange.com/q/26/263
... or just ask a new question.
I'd also recommend that when you catalog your data, that you also consider
adding DataCite metadata, so that we can try to make it easier for others to
cite your data. (specific implementation recommendations for data citation
are still evolving, but general principles have been released; if you have
questions, feel free to ask me, as I think we need to add some clarification to
what we mean on some of the items).
http://www.datacite.org/
https://www.force11.org/datacitation
As I see it, you're dealing with data that's in the problem range -- if it were
larger, the department collecting the data would have a system in place
already; if it were smaller, it's easier to manage as a single item for deposit.
-Joe