Hey Ian, Thanks for getting back to me. I have finished my proposal draft and sent it to you via email. I will be waiting for your feedback.
Thanks, Mohamed Hossam On Fri, Mar 27, 2026 at 7:59 PM Ian Maxon <[email protected]> wrote: > Hey Mohamed, > Great to hear from you again. Sorry for taking a bit to reply. > Excellent prototype, it's exactly what I had in mind. Very nice work. > I checked it out and ran it and it worked perfectly. Using Jackson for > JSON handling makes perfect sense, it's used very heavily in AsterixDB > internally as well for a variety of things. > Those next steps sound great. You might find this document useful for > the low level details of the data model > > https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference > ADM is basically an extension of JSON. I think even that document is a > bit out of date- we also support 'geometry' type which is a GeoJSON > field. > Some kind of integration test also sounds good. It's always a bit > tricky coordinating things between two projects that depend on one > another, so feel free to ask about any questions or difficulties you > come across in that. > > To answer your two questions: > 1. Sure, you can send the proposal draft to me and I'd be happy to > give it a look. > 2. I would defer to the GSOC guide about this, but I think it means > 350 hours over the 12 weeks of the program. It doesn't have to be 8 > hours every day, but it should be about that much. > I put this project as "large" because even though it's conceptually > straightforward, there is a lot of surface area that is important to > get right. I think the code in the area of Metadata management is also > kind of convoluted and hard to read. I didn't want there to be time > pressure when there's a lot of sticky details. > If it ends up being a bit easier than it seems, there are some > adaptations or extensions to the project that could easily fill up the > rest of the time. For example, if it seems like the translator is > working perfectly externally, > then the next step could be integrating it into the main codebase as a > datasource or special function that returns everything. There's also > many variants in terms of how to represent the backups of datasets. > The most > straightforward, conceptually, is to have inserts for each record. > However INSERT statements have poor performance in AsterixDB for a > variety of reasons. Therefore one improvement could be to dump > datasets as > JSONL files, and then have the DDLs to load them instead COPY FROM > statements, or LOAD statements. > > Best, > - Ian > > On Fri, Mar 27, 2026 at 6:34 AM Mohamed Hossam <[email protected]> > wrote: > > > > Hey everyone, > > > > I'm Mohamed Hossam, a recent CS graduate who's interested in database > > systems. I'm currently writing a proposal for the "Backup/restore utility > > for AsterixDB [ASTERIXDB-3697 > > <https://issues.apache.org/jira/browse/ASTERIXDB-3697>]" project for > GSoC > > 2026. I emailed the potential mentor for this project a couple of weeks > > ago, and he instructed me to run AsterixDB locally and investigate the > code > > in asterixdb-metadata. So, I did as suggested by my mentor and I'm > excited > > to share what I learnt. > > > > I managed to create a very minimal prototype that can query AsterixDB and > > generate basic CREATE and INSERT statements from its data. You can find > my > > work at: [m0hossam/asterixdb-dump > > <https://github.com/m0hossam/asterixdb-dump/>]. I used FasterXML's > Jackson > > JSON parser and tried to recreate the metadata objects from the parsed > > JSON. Of course, this is only a proof of concept, I'm deliberately > ignoring > > complex statements just to get the prototype up and running to test > > the feasibility of the project. The actual implementation will require a > > deeper understanding of AsterixDB's metadata and catalog. > > > > My next steps would be: > > > > - Dive deeper into asterixdb-metadata and gain a better understanding > of > > the data model. > > - Potentially contribute to AsterixDB if I find something to improve > in > > the relevant code areas. > > - Write automated unit tests to compare queries from the original > > database with queries from the database generated by my prototype's > dump, > > ensuring database integrity. > > > > > > I have two questions regarding the project: > > > > 1. Should I send my technical proposal to the potential mentor of the > > project for review before submitting it through the official website? > > 2. The project size is supposedly "~350 hour (large)". What does this > > mean in terms of time commitment? Will the project have an extended > > timeline? Or does the project require 8 hours of work per day during > the 3 > > months of coding? > > > > > > Best regards, > > Mohamed Hossam >
