Control: tags -1 + patch

Hello,

Étienne Mollier, on 2021-10-19:
> After some digging, I found one possible cause of breakage.
> Since biopython 1.79, Bio.Seq.UnknownSeq is deprecated [1], so
> it might be possible that ncbi-acc-download does not parse the
> instanciation of that class very well.
> 
> [1]: 
> https://github.com/biopython/biopython/blob/master/DEPRECATED.rst#biosequnknownseq

The issue turned out to be related, and I came up with some
hackery to smoothen the transition to python-biopython 1.79.
The corresponding patch is in attachment.  I welcome remarks,
since I'm only half happy with the result, although I tried hard
to make sure it is functionally equivalent.

> Looks like I would want to take the issue upstream.

Will do once I have a handle to the patch in the bts.

Have a nice day,  :)
-- 
Étienne Mollier <emoll...@emlwks999.eu>
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/2, please excuse my verbosity.
Description: fix wgs download for unknown sequences with biopython 1.79
 Biopython 1.79 deprecated the UnknownSeq class, and as a side effect, the
 instantiation of UnknownSeq in Biopython internals became replaced in favor of
 regular Seq of None data and non-zero length, thus breaking checks on objects
 being instances of UnknownSeq.  Onwards, unknown sequences can be detected as
 data access would raise UndefinedSequenceError, but this error class didn't
 exist yet in Biopython 1.78.
 .
 This patch tries to smoothen the transition between biopython 1.78 and 1.79 by
 making the logic support both versions, probably at the cost of quickly
 unnecessary complexity though, since "try" and "if" statements don't mix very
 well.
Author: Étienne Mollier <emoll...@debian.org>
Bug-Debian: https://bugs.debian.org/996794
Forwarded: no
Last-Update: 2021-10-20
---
This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
--- ncbi-acc-download.orig/ncbi_acc_download/wgs.py
+++ ncbi-acc-download/ncbi_acc_download/wgs.py
@@ -15,6 +15,7 @@
 
 from io import StringIO
 import time
+import sys
 
 from ncbi_acc_download.download import (
     build_params,
@@ -28,6 +29,11 @@
 try:
     from Bio import SeqIO
     from Bio.Seq import UnknownSeq
+    try:
+        from Bio.Seq import UndefinedSequenceError
+    except ImportError:  # didn't exist in biopython 1.78 and earlier
+        class UndefinedSequenceError(ValueError):
+            """Sequence contents is undefined."""
     HAVE_BIOPYTHON = True
 except ImportError:  # pragma: no cover
     HAVE_BIOPYTHON = False
@@ -101,15 +107,25 @@
     handle.seek(0)
     records = list(SeqIO.parse(handle, config.format))
     for record in records:
-        run_download = isinstance(record.seq, UnknownSeq)
-        if run_download and ('wgs_scafld' in record.annotations or
-                             'wgs' in record.annotations or
-                             'tsa' in record.annotations):
-            updated_records.extend(download_wgs_for_record(record, config))
-        elif run_download and 'contig' in record.annotations:
-            updated_records.extend(fix_supercontigs(record, config))
+        junk = open('/dev/null', mode='w')
+        try:
+            # dummy trigger for Biopython 1.79 and later
+            print(record.seq, file=junk)
+            # adjusted logic for Biopython 1.78 and earlier
+            if isinstance(record.seq, UnknownSeq):
+                raise UndefinedSequenceError
+        except UndefinedSequenceError:
+            if ('wgs_scafld' in record.annotations or
+                'wgs' in record.annotations or
+                'tsa' in record.annotations):
+                updated_records.extend(download_wgs_for_record(record, config))
+            elif 'contig' in record.annotations:
+                updated_records.extend(fix_supercontigs(record, config))
+            else:
+                updated_records.append(record)
         else:
             updated_records.append(record)
+        junk.close()
 
     outhandle = StringIO()
     SeqIO.write(updated_records, outhandle, config.format)

Attachment: signature.asc
Description: PGP signature

Reply via email to