dropna() is only defined for DataArrays. The individual columns in a
DataFrame are DataArrays, but the DataFrame itself is not. There is a issue
for it <https://github.com/JuliaStats/DataFrames.jl/issues/602>.
To get a Array out of a DataFrame you are best of building it yourself I
think:
complete_cases!(data)
[data[:x1] data[:x2]]
On Friday, July 4, 2014 4:07:54 AM UTC+3, Donald Lacombe wrote:
>
> Johan,
>
> I think there may be an issue with the Data Frames package as I get the
> following:
>
> julia> data = readtable("test.csv",header=false)
>
> 6x2 DataFrame:
>
> x1 x2
>
> [1,] 1 7
>
> [2,] 2 8
>
> [3,] 3 9
>
> [4,] 4 10
>
> [5,] 5 11
>
> [6,] 6 12
>
>
>
> julia> convert(Array,data)
>
> MethodError(convert,(Array{T,N},6x2 DataFrame:
>
> x1 x2
>
> [1,] 1 7
>
> [2,] 2 8
>
> [3,] 3 9
>
> [4,] 4 10
>
> [5,] 5 11
>
> [6,] 6 12
>
> ))
>
>
> julia> dropna(data)
>
> ErrorException("dropna not defined")
>
>
> I read the documentation and they both say the same thing but it doesn't seem
> to work in my case.
>
>
> Thoughts?
>
>
> Thanks,
>
> Don
>
>
> On Thursday, July 3, 2014 7:54:49 PM UTC-4, Johan Sigfrids wrote:
>>
>> You can use dropna() to convert a DataArray to a Array. This will
>> obviously drop any missing values.
>>
>> On Friday, July 4, 2014 2:08:55 AM UTC+3, Donald Lacombe wrote:
>>>
>>> Patrick (and others),
>>>
>>> Another issue that has reared it's ugly head is that when I read the
>>> data using the Data Frames package, I get the following:
>>>
>>> data = readtable("ct_coord_2.csv",header=false)
>>>
>>> 8x2 DataFrame:
>>>
>>> x1 x2
>>>
>>> [1,] -73.3712 41.225
>>>
>>> [2,] -72.1065 41.4667
>>>
>>> [3,] -73.2453 41.7925
>>>
>>> [4,] -71.9876 41.83
>>>
>>> [5,] -72.3365 41.855
>>>
>>> [6,] -72.7328 41.8064
>>>
>>> [7,] -72.5231 41.4354
>>>
>>> [8,] -72.8999 41.3488
>>>
>>>
>>> julia> xc = data[:,1]
>>>
>>> 8-element DataArray{Float64,1}:
>>>
>>> -73.3712
>>>
>>> -72.1065
>>>
>>> -73.2453
>>>
>>> -71.9876
>>>
>>> -72.3365
>>>
>>> -72.7328
>>>
>>> -72.5231
>>>
>>> -72.8999
>>>
>>>
>>> julia> yc = data[:,2]
>>>
>>> 8-element DataArray{Float64,1}:
>>>
>>> 41.225
>>>
>>> 41.4667
>>>
>>> 41.7925
>>>
>>> 41.83
>>>
>>> 41.855
>>>
>>> 41.8064
>>>
>>> 41.4354
>>>
>>> 41.3488
>>>
>>>
>>> julia> xc=xc'
>>>
>>> 1x8 DataArray{Float64,2}:
>>>
>>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999
>>>
>>>
>>> julia> yc=yc'
>>>
>>> 1x8 DataArray{Float64,2}:
>>>
>>> 41.225 41.4667 41.7925 41.83 41.855 41.8064 41.4354 41.3488
>>>
>>>
>>> julia> temp = [xc;yc]
>>>
>>> 2x8 DataArray{Float64,2}:
>>>
>>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999
>>>
>>> 41.225 41.4667 41.7925 41.83 41.8064 41.4354 41.3488
>>>
>>>
>>> julia> R = pairwise(Euclidean(),temp)
>>>
>>> MethodError(At_mul_B!,(
>>>
>>> 8x8 Array{Float64,2}:
>>>
>>> 2.7273e-316 2.7273e-316 2.67478e-315 … 2.7273e-316 2.7273e-316
>>>
>>> 2.67736e-315 2.67736e-315 2.67736e-315 2.72726e-316 2.72726e-316
>>>
>>> 2.67727e-315 2.67727e-315 2.67727e-315 2.67727e-315 2.67727e-315
>>>
>>> 2.67727e-315 2.67727e-315 2.67727e-315 2.67727e-315 2.67727e-315
>>>
>>> 4.94066e-324 4.94066e-324 4.94066e-324 9.88131e-324 4.94066e-324
>>>
>>> 2.76235e-318 2.76235e-318 2.76235e-318 … 2.76235e-318 2.76235e-318
>>>
>>> 4.94066e-324 4.94066e-324 4.94066e-324 9.88131e-324 4.94066e-324
>>>
>>> 4.94066e-324 4.94066e-324 4.94066e-324 9.88131e-324 4.94066e-324,
>>>
>>>
>>> 2x8 DataArray{Float64,2}:
>>>
>>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999
>>>
>>> 41.225 41.4667 41.7925 41.83 41.8064 41.4354 41.3488,
>>>
>>>
>>> 2x8 DataArray{Float64,2}:
>>>
>>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999
>>>
>>> 41.225 41.4667 41.7925 41.83 41.8064 41.4354 41.3488))
>>>
>>>
>>> I do not think that the Distance package likes the types that is input into
>>> the function, i.e. the vectors are DataArrays instead of Arrays. It works
>>> just fine when I used Tony's idea:
>>>
>>>
>>> julia> data = readcsv("ct_coord_2.csv",Float64)
>>>
>>> 8x2 Array{Float64,2}:
>>>
>>> -73.3712 41.225
>>>
>>> -72.1065 41.4667
>>>
>>> -73.2453 41.7925
>>>
>>> -71.9876 41.83
>>>
>>> -72.3365 41.855
>>>
>>> -72.7328 41.8064
>>>
>>> -72.5231 41.4354
>>>
>>> -72.8999 41.3488
>>>
>>>
>>> julia> xc = data[:,1]
>>>
>>> 8-element Array{Float64,1}:
>>>
>>> -73.3712
>>>
>>> -72.1065
>>>
>>> -73.2453
>>>
>>> -71.9876
>>>
>>> -72.3365
>>>
>>> -72.7328
>>>
>>> -72.5231
>>>
>>> -72.8999
>>>
>>>
>>> julia> yc = data[:,2]
>>>
>>> 8-element Array{Float64,1}:
>>>
>>> 41.225
>>>
>>> 41.4667
>>>
>>> 41.7925
>>>
>>> 41.83
>>>
>>> 41.855
>>>
>>> 41.8064
>>>
>>> 41.4354
>>>
>>> 41.3488
>>>
>>>
>>> julia> xc=xc'
>>>
>>> 1x8 Array{Float64,2}:
>>>
>>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999
>>>
>>>
>>> julia> yc=yc'
>>>
>>> 1x8 Array{Float64,2}:
>>>
>>> 41.225 41.4667 41.7925 41.83 41.855 41.8064 41.4354 41.3488
>>>
>>>
>>> julia> temp = [xc;yc]
>>>
>>> 2x8 Array{Float64,2}:
>>>
>>> -73.3712 -72.1065 -73.2453 -71.9876 … -72.7328 -72.5231 -72.8999
>>>
>>> 41.225 41.4667 41.7925 41.83 41.8064 41.4354 41.3488
>>>
>>>
>>> julia> R = pairwise(Euclidean(),temp)
>>>
>>> 8x8 Array{Float64,2}:
>>>
>>> 0.0 1.28762 0.581327 1.51014 … 0.863479 0.873799 0.487347
>>>
>>> 1.28762 0.0 1.18451 0.382214 0.712542 0.417808 0.802085
>>>
>>> 0.581327 1.18451 0.0 1.25833 0.512668 0.805673 0.562309
>>>
>>> 1.51014 0.382214 1.25833 0.0 0.745667 0.665227 1.03141
>>>
>>> 1.21144 0.451294 0.910982 0.349837 0.399323 0.459258 0.757372
>>>
>>> 0.863479 0.712542 0.512668 0.745667 … 0.0 0.426208 0.487124
>>>
>>> 0.873799 0.417808 0.805673 0.665227 0.426208 0.0 0.386557
>>>
>>> 0.487347 0.802085 0.562309 1.03141 0.487124 0.386557 0.0
>>>
>>>
>>> There seems to be some issue with the Distance package not accepting Data
>>> Frames. Of course, the readcsv works fine but this might be an issue for
>>> others as well.
>>>
>>>
>>> Thanks,
>>>
>>> Don
>>>
>>>
>>>
>>> On Thursday, July 3, 2014 6:49:18 PM UTC-4, Patrick O'Leary wrote:
>>>>
>>>> On Thursday, July 3, 2014 5:36:23 PM UTC-5, Donald Lacombe wrote:
>>>>>
>>>>> I'm no GIS expert (I'm an applied econometrician) and the code I've
>>>>> written seems to work. The Distance package also works with my "real"
>>>>> data
>>>>> which are the centroids of the counties in Connecticut and I tested it
>>>>> with
>>>>> Euclidean, Cityblock, and SqEuclidean.
>>>>>
>>>>
>>>> Glad you got something working. Whether those distances are accurate
>>>> enough depends on how the points are arranged and what you plan to do with
>>>> it--I can see where it wouldn't make much difference in this case. I can't
>>>> let the statisticians and image processing folks have all the technical
>>>> conversation fun in this mailing list, though!
>>>>
>>>