Control: tags -1 + patch Hello,
Étienne Mollier, on 2021-10-19: > After some digging, I found one possible cause of breakage. > Since biopython 1.79, Bio.Seq.UnknownSeq is deprecated [1], so > it might be possible that ncbi-acc-download does not parse the > instanciation of that class very well. > > [1]: > https://github.com/biopython/biopython/blob/master/DEPRECATED.rst#biosequnknownseq The issue turned out to be related, and I came up with some hackery to smoothen the transition to python-biopython 1.79. The corresponding patch is in attachment. I welcome remarks, since I'm only half happy with the result, although I tried hard to make sure it is functionally equivalent. > Looks like I would want to take the issue upstream. Will do once I have a handle to the patch in the bts. Have a nice day, :) -- Étienne Mollier <emoll...@emlwks999.eu> Fingerprint: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da Sent from /dev/pts/2, please excuse my verbosity.
Description: fix wgs download for unknown sequences with biopython 1.79 Biopython 1.79 deprecated the UnknownSeq class, and as a side effect, the instantiation of UnknownSeq in Biopython internals became replaced in favor of regular Seq of None data and non-zero length, thus breaking checks on objects being instances of UnknownSeq. Onwards, unknown sequences can be detected as data access would raise UndefinedSequenceError, but this error class didn't exist yet in Biopython 1.78. . This patch tries to smoothen the transition between biopython 1.78 and 1.79 by making the logic support both versions, probably at the cost of quickly unnecessary complexity though, since "try" and "if" statements don't mix very well. Author: Étienne Mollier <emoll...@debian.org> Bug-Debian: https://bugs.debian.org/996794 Forwarded: no Last-Update: 2021-10-20 --- This patch header follows DEP-3: http://dep.debian.net/deps/dep3/ --- ncbi-acc-download.orig/ncbi_acc_download/wgs.py +++ ncbi-acc-download/ncbi_acc_download/wgs.py @@ -15,6 +15,7 @@ from io import StringIO import time +import sys from ncbi_acc_download.download import ( build_params, @@ -28,6 +29,11 @@ try: from Bio import SeqIO from Bio.Seq import UnknownSeq + try: + from Bio.Seq import UndefinedSequenceError + except ImportError: # didn't exist in biopython 1.78 and earlier + class UndefinedSequenceError(ValueError): + """Sequence contents is undefined.""" HAVE_BIOPYTHON = True except ImportError: # pragma: no cover HAVE_BIOPYTHON = False @@ -101,15 +107,25 @@ handle.seek(0) records = list(SeqIO.parse(handle, config.format)) for record in records: - run_download = isinstance(record.seq, UnknownSeq) - if run_download and ('wgs_scafld' in record.annotations or - 'wgs' in record.annotations or - 'tsa' in record.annotations): - updated_records.extend(download_wgs_for_record(record, config)) - elif run_download and 'contig' in record.annotations: - updated_records.extend(fix_supercontigs(record, config)) + junk = open('/dev/null', mode='w') + try: + # dummy trigger for Biopython 1.79 and later + print(record.seq, file=junk) + # adjusted logic for Biopython 1.78 and earlier + if isinstance(record.seq, UnknownSeq): + raise UndefinedSequenceError + except UndefinedSequenceError: + if ('wgs_scafld' in record.annotations or + 'wgs' in record.annotations or + 'tsa' in record.annotations): + updated_records.extend(download_wgs_for_record(record, config)) + elif 'contig' in record.annotations: + updated_records.extend(fix_supercontigs(record, config)) + else: + updated_records.append(record) else: updated_records.append(record) + junk.close() outhandle = StringIO() SeqIO.write(updated_records, outhandle, config.format)
signature.asc
Description: PGP signature