Re: fasta parser with iopipe?

biocyberman via Digitalmars-d-learn Mon, 28 Aug 2017 07:11:14 -0700

On Wednesday, 23 August 2017 at 13:06:36 UTC, StevenSchveighoffer wrote:

On 8/23/17 5:53 AM, biocyberman wrote:
[...]
I'll respond to all your questions with what I would do,instead of answering each one.
I would suggest an approach similar to how I approached parsingJSON data. In your case, the protocol is even simpler, as thereis no nesting.
1. The base layer iopipe should be something that tokenizes theinput into reference-based structs. If you look at thejsoniopipe library (https://github.com/schveiguy/jsoniopipe),you can see that the lowest level finds the start of the nextJSON token. In your case, it should be looking for
>[...]
This code is pretty straightforward, and roughly corresponds tothis:
while(cannot find start sequence in stream)
    stream.extend;
make sure you aren't re-doing work that has already been done(i.e. save the last place you looked).
Once you have this, you can deduce each packet by the databetween the starts.
2. The next layer should validate and parse the data intostructs that contain referencing data from the buffer. Irecommend not using actual ranges from the buffer, butinformation on how to build the ranges. The reason for this isthat the buffer can move while being streamed by iopipe, soyour data could become invalid if you take actual references tothe buffer. If you look in the jsoniopipe library, the structfor storing a json item has a start and length, but not areference to the buffer.
Potentially, you could take this mechanism and build an iopipeon top of the buffered data. This iopipe's elements would bethe items themselves, with the underlying buffer hidden in theimplementation details. Extending would parse out another setof items, releasing would allow those items to get reclaimed(and the underlying stream data).
This is something I actually wanted to explore with jsoniopipebut didn't have time before the conference. I probably willstill build it.
3. build your real code on top of that layer. What do you wantto do with the data? Easiest thing to do for proof of conceptis build a range out of the functions. That can allow you totest performance with your lower layers. One of the awesomethings about iopipe is testing correctness is really easy --every string is also an iopipe :)
I actually worked with a person at dconf on a similar (maybeidentical?) format and explained how it could be done in a verysimilar way. He was looking to remove data that had a lowpercentage of correctness (or something like that, not inbioinformatics, so I don't understand the real semantics).
With this mechanism in hand, the decompression is pretty easyto chain together with whatever actual stream you have, justuse iopipe.zip.
Good luck, and email me if you need more help(schvei...@yahoo.com).
-Steve


Hi Nic and Steve

Thank you both very much for your inputs. I am trying to make useof them. I will try to adapt jsoniopipe for fasta. This is ongoing and broken code: https://github.com/biocyberman/fastaq .PRs are welcome.


@Nic:

I am too very interested in bringing D to bioinformatics. I willbe happy to share information I have. Feel free to email me atvql(.at.)rn.dk and we talk further about it.

@Steve: Yes we talked at dconf 2017. I had to other things so Dlearning got slow down. I am trying with Fasta format beforejumping to Fastq again. The jsoniopipe is full feature, andrelatively small project, which can be used to study case.However there are some aspects I still haven't fully understood.Would I be lucky enough to have you make the current broken codeof fastaq to work? :) That will definitely save me time andheadache dealing with newbie problems.

Re: fasta parser with iopipe?

Reply via email to