Hey Ian,

Thanks for getting back to me. I have finished my proposal draft and sent
it to you via email. I will be waiting for your feedback.

Thanks,
Mohamed Hossam

On Fri, Mar 27, 2026 at 7:59 PM Ian Maxon <[email protected]> wrote:

> Hey Mohamed,
> Great to hear from you again. Sorry for taking a bit to reply.
> Excellent prototype, it's exactly what I had in mind. Very nice work.
> I checked it out and ran it and it worked perfectly. Using Jackson for
> JSON handling makes perfect sense, it's used very heavily in AsterixDB
> internally as well for a variety of things.
> Those next steps sound great. You might find this document useful for
> the low level details of the data model
>
> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
> ADM is basically an extension of JSON. I think even that document is a
> bit out of date- we also support 'geometry' type which is a GeoJSON
> field.
> Some kind of integration test also sounds good. It's always a bit
> tricky coordinating things between two projects that depend on one
> another, so feel free to ask about any questions or difficulties you
> come across in that.
>
> To answer your two questions:
> 1. Sure, you can send the proposal draft to me and I'd be happy to
> give it a look.
> 2. I would defer to the GSOC guide about this, but I think it means
> 350 hours over the 12 weeks of the program. It doesn't have to be 8
> hours every day, but it should be about that much.
> I put this project as "large" because even though it's conceptually
> straightforward, there is a lot of surface area that is important to
> get right. I think the code in the area of Metadata management is also
> kind of convoluted and hard to read. I didn't want there to be time
> pressure when there's a lot of sticky details.
> If it ends up being a bit easier than it seems, there are some
> adaptations or extensions to the project that could easily fill up the
> rest of the time. For example, if it seems like the translator is
> working perfectly externally,
> then the next step could be integrating it into the main codebase as a
> datasource or special function that returns everything. There's also
> many variants in terms of how to represent the backups of datasets.
> The most
> straightforward, conceptually, is to have inserts for each record.
> However INSERT statements have poor performance in AsterixDB for a
> variety of reasons. Therefore one improvement could be to dump
> datasets as
> JSONL files, and then have the DDLs to load them instead COPY FROM
> statements, or LOAD statements.
>
> Best,
> - Ian
>
> On Fri, Mar 27, 2026 at 6:34 AM Mohamed Hossam <[email protected]>
> wrote:
> >
> > Hey everyone,
> >
> > I'm Mohamed Hossam, a recent CS graduate who's interested in database
> > systems. I'm currently writing a proposal for the "Backup/restore utility
> > for AsterixDB [ASTERIXDB-3697
> > <https://issues.apache.org/jira/browse/ASTERIXDB-3697>]" project for
> GSoC
> > 2026. I emailed the potential mentor for this project a couple of weeks
> > ago, and he instructed me to run AsterixDB locally and investigate the
> code
> > in asterixdb-metadata. So, I did as suggested by my mentor and I'm
> excited
> > to share what I learnt.
> >
> > I managed to create a very minimal prototype that can query AsterixDB and
> > generate basic CREATE and INSERT statements from its data. You can find
> my
> > work at: [m0hossam/asterixdb-dump
> > <https://github.com/m0hossam/asterixdb-dump/>]. I used FasterXML's
> Jackson
> > JSON parser and tried to recreate the metadata objects from the parsed
> > JSON. Of course, this is only a proof of concept, I'm deliberately
> ignoring
> > complex statements just to get the prototype up and running to test
> > the feasibility of the project. The actual implementation will require a
> > deeper understanding of AsterixDB's metadata and catalog.
> >
> > My next steps would be:
> >
> >    - Dive deeper into asterixdb-metadata and gain a better understanding
> of
> >    the data model.
> >    - Potentially contribute to AsterixDB if I find something to improve
> in
> >    the relevant code areas.
> >    - Write automated unit tests to compare queries from the original
> >    database with queries from the database generated by my prototype's
> dump,
> >    ensuring database integrity.
> >
> >
> > I have two questions regarding the project:
> >
> >    1. Should I send my technical proposal to the potential mentor of the
> >    project for review before submitting it through the official website?
> >    2. The project size is supposedly "~350 hour (large)". What does this
> >    mean in terms of time commitment? Will the project have an extended
> >    timeline? Or does the project require 8 hours of work per day during
> the 3
> >    months of coding?
> >
> >
> > Best regards,
> > Mohamed Hossam
>

Reply via email to