Hello, As a software developer I've used git for years and have found it the perfect solution for source control.
Lately I have found myself using git in a unique use-case - modifying DNA/RNA sequences and storing them in git, which are essentially software/source code for cells/life. For Bacteria and Viruses the repo's are very small <10mb & compress nicely. However on the extreme end of the spectrum a human genome can run in at 50gb or say ~1gb per file/chromosome. Now, this is not the binary problem and it is not the same as storing media inside git - I have reviewed the solutions that exist for the binary problem, such as git-annex, git-media & bup. But they don't provide the featureset of git and the data i'm storing is more like plaintext sourcecode with relatively small edits per commit. I have googled and asked in #git which discussion mostly revolved around these tools. The only project that holds interest is a 2009 project, git-bigfiles - however it is abit dated & the author is not interested in reviving this project - referring me to git-annex. Unfortunately. With that background; I wanted to discuss the problems with git and how I can contribute to the core project to best solve them. >From my understanding the largest problem revolves around git's delta discovery method, holding 2 files in memory at once - is there a reason this could not be adapted to page/chunk the data in a sliding window fashion ? Are there any other issues I need to know about, is anyone else working on making git more capable of handling large source files that I can collaborate with? Thanks for your time, Jarrad -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html