Perfect. Thanks! 

On Friday, January 2, 2015 8:17:52 PM UTC-5, Sean Garborg wrote:
>
> Thanks for reporting -- it is a bug. Having a Array or DataArray with 
> NAtype as its eltype is a little awkward. Here's why it's causing you 
> trouble, and a couple alternatives:
>
> using DataFrames
> nrows = 3
> a = DataFrame(A = 1:nrows)
>
> # Column :A is all NA for all of these cases
> b1 = DataFrame(A = fill(NA, nrows))
> b2 = DataFrame(A = DataArray(Int, nrows))
> b3 = DataFrame(A = DataArray(None, nrows))
>
> vcat(a, b1) # ERROR: no method matching convert(::Type{Int64}, 
> ::DataArrays.NAtype)
> vcat(a, b2) # okay
> vcat(a, b3) # okay
>
> It should probably work as is (if not, I guess the promotion rules should 
> change, and the result should be of type Any or there should be a more 
> informative error).
>
> I opened an issue: https://github.com/JuliaStats/DataArrays.jl/issues/134, 
> but given that most interested developers are focused on coming up with an 
> replacement for DataArrays and NAtype, it may not get attention at the 
> moment, so I'd avoid creating that ambiguous array if possible for now.
>
>
>
> For your other question, conversion of columns, you'll generally use 
> functions from Base Julia or DataArrays.jl to transform data however you 
> like.
>
> Categorical variables are (for the moment) represented using 
> PooledDataArrays, so:
> pdata(abstract_array) or convert(PooledDataArray, abstract_array)
>
> And for strings:
> map(string, abstract_array) or convert(some_string_type, abstract_array)
>
>
> On Friday, January 2, 2015 3:05:31 PM UTC-7, Guillaume Guy wrote:
>>
>> Sean:
>>
>> I found the problem. Not sure if that is a "bug" per se.
>>
>> Looking at one element of the Array (which is subsequently vcat-ed):
>>
>>
>> <https://lh5.googleusercontent.com/-qE0qADLTofE/VKcS_9LqP4I/AAAAAAAADsw/WqliDGO7Lnk/s1600/dfs.PNG>
>>
>> Note the NA in the equipment column. When running my function 
>> (intermediary_point) on each row of my input dataframe, equipment (which is 
>> a String column) becomes NA of NAType. Then, the resulting dataframe (see 
>> above) has an equipment column type which is now NAtype.
>>
>> Anyway ... You end up with dfs that has some elements looking like that:
>>
>> 7-element Array{Type{T<:Top},1}:
>>  UTF8String
>>  NAtype    
>>  UTF8String
>>  UTF8String
>>  Int64     
>>  Float64   
>>  Float64
>>
>>
>> and some elements with the correct type. The vcat returns a convert error 
>> trying to convert the NAtype into String.
>>
>>
>> Is it a bug? Shouldn't the vcat convert the NAType into String?  
>>
>>
>> Another question I have is about how to convert a column type within an 
>> existing dataframe.... I'm looking for an Julia equivalent of R's *as.factor 
>> *or *as.string . *Alternative, when running DataFrame(A=1:20,B=1:20), is 
>> there a way to specify what A and B should be? 
>>
>>
>> Thx! 
>>
>>
>>
>> On Wednesday, December 31, 2014 10:42:30 PM UTC-5, Sean Garborg wrote:
>>>
>>> If you Pkg.update() and try again, you should be fine. DataFrames was 
>>> overdue for a tagged release -- you'll get v0.6.0 which includes some 
>>> updates to vcat. As a gut check, this works just fine:
>>>
>>> using DataFrames
>>> dfs = [DataFrame(Float64, 15, 15) for _=1:200_000]
>>> vcat(dfs)
>>>
>>> (If it doesn't for you, definitely file an issue.)
>>>
>>> Happy New Year,
>>> Sean
>>>
>>> On Thursday, December 25, 2014 5:06:23 PM UTC-7, Guillaume Guy wrote:
>>>>
>>>> Hi David:
>>>>
>>>> That is where the stack overflow error is thrown.
>>>>
>>>> I attached the code + the data in my first post for your reference.
>>>>
>>>>
>>>> On Thursday, December 25, 2014 6:59:57 PM UTC-5, David van Leeuwen 
>>>> wrote:
>>>>>
>>>>> Hello Guillome, 
>>>>>
>>>>> On Monday, December 22, 2014 9:09:16 PM UTC+1, Guillaume Guy wrote:
>>>>>>
>>>>>> Dear Julia users:
>>>>>>
>>>>>> Coming from a R background, I like to work with list of dataframes 
>>>>>> which i can reduce by doing do.call('rbind',list_of_df) 
>>>>>>
>>>>>> After ~10 years of using R, I only recently leaned of the do.call(). 
>>>>>
>>>>> In Julia, you would say:
>>>>>
>>>>> vcat(dfs...)
>>>>>
>>>>> ---david
>>>>>  
>>>>>
>>>>>> In Julia, I attempted to use vcat for this purpose but I ran into 
>>>>>> trouble:
>>>>>>
>>>>>> "
>>>>>>
>>>>>> stack overflow
>>>>>> while loading In[29], in expression starting on line 1
>>>>>>
>>>>>> "
>>>>>>
>>>>>>
>>>>>> This operation is basically the vcat of a large vector v consisting 
>>>>>> of 68K small (11X7) dataframes. The code is attached.
>>>>>>
>>>>>> Thanks for your help! 
>>>>>>
>>>>>

Reply via email to