the following re-reads the header and generates a dictionary which assigns
the original column name to the converted one, in a one-liner-ish:
df = readtable("/the/file.csv")
h = Dict(zip(keys(df.colindex.lookup),split(open("/tmp/file.csv") do f
chomp(readline(f)) ; end,",")[collect(values(df.colindex.lookup))]))
now aside from using `h` in other ways, you can do:
melteddf[:Region] = [h[r] for r in melteddf[:Region]]
to fix the `melteddf`.
On Wednesday, December 16, 2015 at 2:39:57 AM UTC+2, David Anthoff wrote:
>
> Hi,
>
>
>
> I have a csv file that roughly looks like this:
>
>
>
>
>
> Year,Name of country 1, Name of country 2
>
> 1950, 5., 6.
>
> 1951, 6., 8.
>
>
>
> The real file has more columns and rows.
>
>
>
> I want to bring this into tidy format, so that I have a DataFrame that
> looks like this:
>
>
>
> Year, Region, Value
>
> 1950, Name of country 1, 5.
>
> 1950, Name of country 2, 6.
>
> 1951, Name of country 1, 6.
>
> 1951, Name of country 2, 8.
>
>
>
> Right now I read the file with readtable into a DataFrame and then use
>
>
>
> melt(df, :Year)
>
>
>
> This gives me the right structure, but now all the country names are
> messed up, e.g. they look like “Name_of_country_1” instead of “Name of
> country 1”.
>
>
>
> I understand why that is the case, i.e. readtable converts strings into
> symbols and has to insert these underscores, but I’m wondering whether the
> original string value is preserved somewhere, and could be used in the melt
> operation in some way?
>
>
>
> Thanks,
>
> David
>
>
>
> --
>
> David Anthoff
>
> University of California, Berkeley
>
>
>
> http://www.david-anthoff.com
>
>
>