I was reading the PNAS author guidelines and I came across this gem: Datasets: Supply Excel (.xls), RTF, or PDF files. This file type will be published in raw format and will not be edited or composed.
Did I read those last two file formats correctly? I have actually came across a dataset in supplementary information that was several dozen pages of PDF. It was effectively impossible to extract the data from this document. (I can dig it up if pressed, probably.) I had no idea that the authors may have been encouraged to submit their data like that. Does a premiere scientific journal actually request data to be in PDF format? I can think of dozens of other formats that would be more fitting. They are summarized here: http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats What is the scholarly equivalent to a torch and pitchfork march and how can we organize such a march to encourage journals to require proper serialization formats for datasets in supplementary info? James P.S. I am aware that it is better to submit data to a dedicated repository, but let's consider those cases where research produces data for which there is not yet a dedicated repository.
