Usually, the solution to these problems is to do less per line, break it
out and perform each minute operation as a field, then combine those into a
final answer. Can you do that here?

Thanks,
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com Book a time on Calendly
<https://calendly.com/rjurney_personal/30min>


On Thu, Feb 23, 2023 at 11:07 AM Oliver Ruebenacker <
oliv...@broadinstitute.org> wrote:

> Here is the complete error:
>
> ```
> Traceback (most recent call last):
>   File "nearest-gene.py", line 74, in <module>
>     main()
>   File "nearest-gene.py", line 62, in main
>     distances = joined.withColumn("distance", max(col("start") -
> col("position"), col("position") - col("end"), 0))
>   File
> "/mnt/yarn/usercache/hadoop/appcache/application_1677167576690_0001/container_1677167576690_0001_01_000001/pyspark.zip/pyspark/sql/column.py",
> line 907, in __nonzero__
> ValueError: Cannot convert column into bool: please use '&' for 'and', '|'
> for 'or', '~' for 'not' when building DataFrame boolean expressions.
> ```
>
> On Thu, Feb 23, 2023 at 2:00 PM Sean Owen <sro...@gmail.com> wrote:
>
>> That error sounds like it's from pandas not spark. Are you sure it's this
>> line?
>>
>> On Thu, Feb 23, 2023, 12:57 PM Oliver Ruebenacker <
>> oliv...@broadinstitute.org> wrote:
>>
>>>
>>>      Hello,
>>>
>>>   I'm trying to calculate the distance between a gene (with start and
>>> end) and a variant (with position), so I joined gene and variant data by
>>> chromosome and then tried to calculate the distance like this:
>>>
>>> ```
>>> distances = joined.withColumn("distance", max(col("start") -
>>> col("position"), col("position") - col("end"), 0))
>>> ```
>>>
>>>   Basically, the distance is the maximum of three terms.
>>>
>>>   This line causes an obscure error:
>>>
>>> ```
>>> ValueError: Cannot convert column into bool: please use '&' for 'and',
>>> '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
>>> ```
>>>
>>>   How can I do this? Thanks!
>>>
>>>      Best, Oliver
>>>
>>> --
>>> Oliver Ruebenacker, Ph.D. (he)
>>> Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, 
>>> Flannick
>>> Lab <http://www.flannicklab.org/>, Broad Institute
>>> <http://www.broadinstitute.org/>
>>>
>>
>
> --
> Oliver Ruebenacker, Ph.D. (he)
> Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, 
> Flannick
> Lab <http://www.flannicklab.org/>, Broad Institute
> <http://www.broadinstitute.org/>
>

Reply via email to