Usually, the solution to these problems is to do less per line, break it out and perform each minute operation as a field, then combine those into a final answer. Can you do that here?
Thanks, Russell Jurney @rjurney <http://twitter.com/rjurney> russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB <http://facebook.com/jurney> datasyndrome.com Book a time on Calendly <https://calendly.com/rjurney_personal/30min> On Thu, Feb 23, 2023 at 11:07 AM Oliver Ruebenacker < oliv...@broadinstitute.org> wrote: > Here is the complete error: > > ``` > Traceback (most recent call last): > File "nearest-gene.py", line 74, in <module> > main() > File "nearest-gene.py", line 62, in main > distances = joined.withColumn("distance", max(col("start") - > col("position"), col("position") - col("end"), 0)) > File > "/mnt/yarn/usercache/hadoop/appcache/application_1677167576690_0001/container_1677167576690_0001_01_000001/pyspark.zip/pyspark/sql/column.py", > line 907, in __nonzero__ > ValueError: Cannot convert column into bool: please use '&' for 'and', '|' > for 'or', '~' for 'not' when building DataFrame boolean expressions. > ``` > > On Thu, Feb 23, 2023 at 2:00 PM Sean Owen <sro...@gmail.com> wrote: > >> That error sounds like it's from pandas not spark. Are you sure it's this >> line? >> >> On Thu, Feb 23, 2023, 12:57 PM Oliver Ruebenacker < >> oliv...@broadinstitute.org> wrote: >> >>> >>> Hello, >>> >>> I'm trying to calculate the distance between a gene (with start and >>> end) and a variant (with position), so I joined gene and variant data by >>> chromosome and then tried to calculate the distance like this: >>> >>> ``` >>> distances = joined.withColumn("distance", max(col("start") - >>> col("position"), col("position") - col("end"), 0)) >>> ``` >>> >>> Basically, the distance is the maximum of three terms. >>> >>> This line causes an obscure error: >>> >>> ``` >>> ValueError: Cannot convert column into bool: please use '&' for 'and', >>> '|' for 'or', '~' for 'not' when building DataFrame boolean expressions. >>> ``` >>> >>> How can I do this? Thanks! >>> >>> Best, Oliver >>> >>> -- >>> Oliver Ruebenacker, Ph.D. (he) >>> Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, >>> Flannick >>> Lab <http://www.flannicklab.org/>, Broad Institute >>> <http://www.broadinstitute.org/> >>> >> > > -- > Oliver Ruebenacker, Ph.D. (he) > Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, > Flannick > Lab <http://www.flannicklab.org/>, Broad Institute > <http://www.broadinstitute.org/> >